back to indexGitHub's AI Powered Security Platform: Sarah Khalife

00:00:00.000 |
Thank you all for joining today. Thank you for staying if you stayed from the previous session 00:00:17.480 |
as well. I know there were a bunch of GitHub talks, so I'm very excited that you got to hear 00:00:24.000 |
from all of us from all different places within the company, but also from all different teams. 00:00:28.800 |
So, you heard a lot about co-pilots, a lot, a lot about co-pilot. Co-pilot today, co-pilot 00:00:34.800 |
tomorrow, co-pilot in the next year, maybe in a couple years. So, from workspaces to get 00:00:39.700 |
a co-pilot enterprise to co-pilot futures, there's so much more going on with co-pilot 00:00:44.800 |
itself. But what we really want to actually do within GitHub is not only incorporate some 00:00:50.160 |
of the new features and capabilities within co-pilot only, but we also want to incorporate 00:00:54.400 |
it as part of the platform. So, everything within the platform itself will start having 00:00:59.360 |
AI in it. So, today, you've heard a lot about co-pilot. Obviously, we're at an AI conference. 00:01:03.800 |
So, if you didn't, that would have been a bad thing. But with collaboration, productivity and 00:01:08.800 |
security, those are some of our biggest aspects of the GitHub core platform capabilities there. 00:01:14.400 |
From pull requests, as you heard some of the code review things that were happening earlier on. But also from 00:01:19.360 |
productivity perspective, how do you gain momentum and get more developers excited to do their work across 00:01:24.800 |
their developer platforms, especially if they're using GitHub co-pilots. How do you get that momentum going 00:01:30.800 |
from a productivity perspective? And then, security. Security, I feel like, hasn't been talked about as much in terms of how can we 00:01:37.360 |
how can we improve app sec with AI rather than just how do we talk about security around AI. Security around AI is 00:01:46.320 |
probably something talked about all the time from legal to privacy to everything in between. But what we want to 00:01:53.280 |
do at GitHub is to also incorporate AI to make sure that security is also being progressed along with the new 00:01:59.600 |
capabilities that we're seeing today. And everything that GitHub does, we want to do it with scale. We want to build our 00:02:06.000 |
integrations and our APIs. So anything that I've talked about today is generally API backed or there's a third party integrator that 00:02:12.320 |
also comes in and helps with a lot of the capabilities. There's never one path that 00:02:16.800 |
solves all the answers. So we want to make sure that all of our customers, you know, have a path away to a solution, have a 00:02:22.720 |
path to that solution. Today, if you didn't know, actually, this might be not the latest numbers, actually. But I 00:02:30.880 |
think the numbers are more. I haven't updated the slide in a while. But there's over 100 million 00:02:35.760 |
developers registered on GitHub. There's 4 million organizations, 90% of the Fortune 100 companies, 00:02:42.880 |
which sometimes I'm surprised by the numbers because I know everybody uses GitHub. But like 90% of the 00:02:48.560 |
Fortune 100 is mind blowing. I work with customers day in, day out. But this is still mind blowing to see, 00:02:54.720 |
because that is how we get a lot of our feedback. That's how we get a lot of our community part of the 00:02:59.760 |
conversation. That's how we build a lot of our features. Anything that we do, we incorporate that community 00:03:05.040 |
back into the conversation. By means of introductions, I'm Sarah Khalifa. I'm a principal 00:03:11.600 |
solutions engineer here at GitHub. I've been at GitHub for about four and a half years now, which is a 00:03:17.360 |
long time for GitHub's life. And it was pre-copilot, pre some of the capabilities, almost pre-actions, 00:03:26.240 |
if you use actions. So, you know, we've seen the platform grow a lot during the last couple of years. 00:03:31.920 |
But like I said, the community is what really makes it the biggest, better platform across the board. 00:03:37.120 |
Our customers, our vendors, our partners, everybody that is part of that conversation is how we really get 00:03:42.960 |
into the next iteration of what we're going to build. And I'm lucky enough to work with a lot of 00:03:48.160 |
customers. So I'm working on the customer side of things. All my customers today are financial services, 00:03:52.720 |
but over the last four and a half years, I've been working with all types of customers from all 00:03:57.280 |
different parts of the industries outside in this enterprise world that we live in. 00:04:01.920 |
So I did a quick introduction of what is the GitHub platform today. We talked about 00:04:08.800 |
co-pilot, but that's only one aspect. We talked a little about collaboration. You heard about it earlier 00:04:13.360 |
on through Christina and Chris's session. You also heard a lot about productivity, especially with 00:04:18.800 |
copilot being in that conversation. But today we're going to talk about GitHub advanced security and 00:04:24.000 |
how we're incorporating AI into that. If you haven't heard of GitHub advanced security, don't worry, 00:04:29.200 |
I'm going to cover what is GitHub advanced security. And then we're going to talk about some of the new 00:04:33.600 |
features are coming along that are the AI aspects into advanced security. And then we'll do a live demo, 00:04:38.240 |
because as you can see, all of our GitHub teams here love doing live demos, even if they don't work, 00:04:42.640 |
or if they work better than they expect, which is a very great example of what happened in the earlier session today. 00:04:48.160 |
Okay. So we'll be focusing on one aspect of the platform, and then we'll be talking about 00:04:55.360 |
specifically security. So how can we incorporate security in our day-to-day work? And then how can 00:04:59.760 |
AI improve that experience for developers? Who here has used GitHub advanced security? Anybody? 00:05:06.880 |
Nice. A couple people. So what is GitHub advanced security? So GitHub advanced security allows you to 00:05:14.160 |
incorporate, think of it as your AppSec aspects into your developer platform day in, day out, and has that 00:05:21.360 |
GitHub experience. So when we talk about advanced security, we have code scanning. Code scanning allows 00:05:26.320 |
you to do SaaS and other types of code scanning within the developer platform, within the GitHub ecosystem, 00:05:32.080 |
to find vulnerabilities and detect different patterns that are vulnerable to then help remediate them 00:05:36.720 |
faster. With code scanning today, there are two aspects that I would say are the more popular aspects 00:05:42.080 |
to talk about. And we can talk about a lot more if you want to come to the GitHub, Microsoft and GitHub 00:05:46.480 |
booth later on. But code scanning today allows you to detect vulnerabilities using code QL, which is our 00:05:52.880 |
internal or proprietary language, I guess, but it's open source. So you can actually build it, 00:05:57.840 |
your own queries yourself. But with code scanning today, it allows you to detect different vulnerable 00:06:03.440 |
patterns across your code. We'll find the data flow and be able to analyze that data flow to understand 00:06:08.400 |
where the vulnerability is, and what type of source and exploits you have within that vulnerability. 00:06:14.000 |
So that makes it super powerful, especially when we talk about some of the AI aspects that we're adding 00:06:18.240 |
to it, because we have the context and we have that information in there. With code QL today, 00:06:23.680 |
there are more than I don't even know how many query packs that we offer internally or from the GitHub 00:06:30.240 |
side. But we also bring in that community. So anything that we do is community backed. And sorry, 00:06:34.880 |
I keep hitting the mic. But anything that we do is community backed. So when we talk about code QL, 00:06:39.440 |
it's not just our queries, it's Microsoft queries, it's Google's queries, it's Uber's queries, and so forth. 00:06:44.240 |
And we build that aspect of community in even in security, just because we know there's never going to 00:06:49.440 |
be GitHub is going to answer every single question that you may have, especially for the companies 00:06:54.160 |
that you work for. The other aspect of code scanning is to incorporate third party integrations. So again, 00:07:00.000 |
this is where our vendor and partnership comes through. So if you're using something like container 00:07:04.000 |
scanning, code scanning today, code QL specifically is more focused on SaaS. We're not going to do 00:07:09.280 |
container scanning today, at least not in the near future. So why not incorporate some of the results 00:07:14.320 |
back into your pull request and get that information and feedback sooner than later. So that's an aspect 00:07:20.080 |
of third party integrations, code scanning, infrastructures, code scanning, sorry, container 00:07:25.360 |
scanning, infrastructures, code scanning, or any other third parties that you want to integrate with, 00:07:29.120 |
you can do that through serif inputs through code scanning. The biggest win, which all my customers have 00:07:35.440 |
loved, and I don't know if you felt it if you're doing even open source work is secret scanning. Secret 00:07:41.360 |
scanning is a lifesaver in many, many cases. How many times has somebody accidentally left some, 00:07:48.400 |
you know, secrets, maybe AWS or Azure keys in there in their log file without realizing and, you know, 00:07:56.640 |
it happens. You do a test file and you forget to add it to your git ignore. It saves, it saves a lot of 00:08:03.600 |
Bitcoin miners being spun up within your environment. So secret scanning does a full-blown analysis of secrets 00:08:11.040 |
across your repositories from API keys to your own custom secrets and using AI that I'll be talking 00:08:17.680 |
about in a little bit on how to detect other types of secrets that are just plain text but really hard 00:08:23.120 |
to reduce the amount of false positives on them. AI can really help with that. So with secret scanning, 00:08:28.400 |
it's been the biggest win across all of my customers and it's been a big, big discussion on how do we 00:08:33.200 |
prevent things but also how do we reactively and proactively prevent things. So proactively what we 00:08:40.080 |
have incorporated was push protection so it allows you to block any pushes that are coming to the GitHub 00:08:45.600 |
ecosystem before the secret is being exposed or before it goes into your git commit history so then you can, 00:08:51.440 |
you don't have to revoke it at that point. But what's in your git commit history? We don't recommend 00:08:55.840 |
deleting anything in your git commit history especially if you work for a regulated industry 00:09:00.080 |
that goes through audits and has to maintain a lot of historical aspects very precisely. That's where 00:09:06.800 |
we just recommend, hey, we are able to detect not only your current state but all your git history 00:09:11.920 |
and other issues, pull requests, comments and pull requests and so forth if there's any secrets in 00:09:16.720 |
those and we really recommend revoking them. And last but not least, supply chain security. If you 00:09:24.320 |
have maybe heard of it as Dependabot is one of the tools within the supply chain security aspect of it, 00:09:30.400 |
Dependabot allows you to detect dependencies that are vulnerable today. So with Dependabot, it gives you 00:09:36.160 |
an opportunity to say, hey, I found vulnerabilities for these dependents or found that these dependencies are 00:09:41.360 |
vulnerable. So maybe we need to upgrade to the latest version. There's going to be some additional AI 00:09:45.840 |
components are being added to that as well. But that's something that I won't be covering as much 00:09:49.280 |
today because that's still early earlier on in the stages there. Some of our secret scanning partners 00:09:56.240 |
that we work with today are very much common vendors that you might be working with throughout the 00:10:02.240 |
your day to day. But this is where again, we talk about the community and the vendors and the partners 00:10:07.040 |
that we work with. Because what we do is for secret scanning is not only 00:10:10.800 |
incorporate their patterns, but we also push them to improve how they're doing the patterns to make 00:10:16.240 |
sure that their vulnerability, their secrets are in general, are not going to create a lot of false 00:10:22.160 |
positives. So we create a kind of like a mechanism for them to add hashes and more kind of more specific 00:10:28.240 |
information to be able to detect these secrets almost at 99%, 99.9 something percent. I don't know the 00:10:34.880 |
exact average that we have today, but it's pretty high up there and it reduces the amount of false 00:10:39.280 |
positives so significantly when you use some of our high fidelity partners that we work with. 00:10:45.280 |
So, at any given point in time, GitHub really believes security should be part of the day-to-day 00:10:53.840 |
responsibilities of everybody. It's a shared responsibility. It's never just AppSec saying, 00:10:58.720 |
hey, you need to fix these 10 vulnerabilities by tomorrow or else we can't deploy. It's never just 00:11:05.360 |
the developer trying to figure out how to fix this vulnerability that they've never even heard of or 00:11:10.240 |
maybe not even understand to then be able to deploy on time. So it should be more of a shared 00:11:16.160 |
responsibility. So our goal is to bridge that gap and make that conversation a lot easier. So anything that 00:11:20.640 |
we do with advanced security, anything that we're doing with AI allows you to really add that aspect 00:11:25.280 |
into it. So what can AI do for us? How can we benefit with, how can AI benefit security? With AI, there's so 00:11:36.320 |
much more that you can do, especially with generative AI, as you can see with co-pilot, with all the new 00:11:41.280 |
customers, vendors, partners that you're seeing here at this conference. There's just so many aspects to 00:11:46.240 |
it. So the first couple of things that we've noticed right off the bat is easier identification. 00:11:51.360 |
How can we help, how can AI help us identify vulnerabilities or secrets much easier? How can we 00:11:58.160 |
have faster remediation? So when you identify things, if you're not fixing them, then what's the point of 00:12:03.280 |
identifying them half the time, right? If you're not going to fix the vulnerabilities, that's where the 00:12:07.760 |
actual issue is. It's easy to find vulnerabilities lately, a lot more easier than they were before, 00:12:14.160 |
but fixing them is the actual issue. And that's where the productivity aspect also comes into play. 00:12:19.360 |
And last but not least, driving that productivity. The faster you're able to fix vulnerabilities, the 00:12:23.920 |
faster you're able to be a little more productive, increase your like security risk postures of where 00:12:29.440 |
your company may be today, reduce the amount of, you know, turmoil that you have to hit, you know, 00:12:35.680 |
by deploying earlier on with the fix rather than waiting to like production or after production, 00:12:40.720 |
or when a customer is using your product already. But in general, this is where AI, we see AI really 00:12:46.160 |
helping introduce a lot more of that, more of those capabilities. So first, but not the most important, 00:12:55.520 |
but it is probably the biggest one that we are very excited for is code scanning autofix. With code 00:13:02.000 |
scanning autofix, not only are we helping detect vulnerabilities with code scanning, 00:13:07.040 |
but now we're providing a way to autofix those with AI. So in the pull request, as you're working 00:13:13.520 |
actively, it will actually provide a response back to say, hey, maybe you should be fixing this 00:13:19.040 |
vulnerability this specific way. And it'll give you a suggestion. Obviously, it's AI. So it's going to give 00:13:23.840 |
you a suggestion of what it thinks based on the context it has, you can always edit it, you can always fix it, 00:13:28.800 |
or you can commit it and rerun your test and see if it actually fixes that vulnerability. 00:13:32.800 |
With code scanning autofix, code QL is what's providing a lot of that context. So code QL is 00:13:39.120 |
finding the data flow of that vulnerability, you're getting information of what that vulnerability common 00:13:44.400 |
fixes are when you're doing code scanning. So providing the context in the way that we are doing 00:13:48.960 |
our backend system to prompt that request, it's actually providing a really, really good autofix 00:13:54.960 |
result. From our customers and from all of my customers that have tested this today, they found 00:14:00.000 |
that autofix has been pretty successful for, I don't know, maybe 70% of their use cases. But again, this is 00:14:06.800 |
going to only get better as we're working with more customers as more people are starting to test this 00:14:11.760 |
out. And it's in public beta today, so you can actually test this out yourself if you're interested. 00:14:15.600 |
Second to this is the secret scanning improvements. Some of the aspects of creating a custom pattern 00:14:24.000 |
requires a lot of work. I mean, I don't know, who knows regex to the point where they can, they feel 00:14:30.880 |
confident, confident rolling out a regex scan across all your repositories, right? I personally cannot 00:14:39.680 |
claim that I do. My regex, I mean, it's not bad, but it's not the best that it can be. So why not 00:14:45.600 |
have AI help us custom generate those regexes? So with custom pattern generation, you can provide AI 00:14:51.760 |
capabilities to maybe suggest a different way to write some of your regexes. So you can provide samples 00:14:58.320 |
and examples of what you're looking for. And then AI, our secret can or secret scanning custom pattern 00:15:04.080 |
generation would generate a custom pattern for you to at least have a starting point if you don't think 00:15:09.280 |
that's the full answer just yet. But it generates that custom pattern. And it makes it so much easier 00:15:14.480 |
to give more and more examples because the more context it has, the better answer will provide. 00:15:18.720 |
And it will generate a response back so you don't have to figure out how to 00:15:22.320 |
write this new on search for this type of regex to find a custom pattern. 00:15:25.760 |
So this has simplified the process so much. And a lot of our customers have loved, loved, 00:15:31.200 |
loved having this capability because they were doing that anyway, probably on chat GPT, 00:15:35.920 |
or maybe going to copilot in their IDE, or maybe they were doing this on Google and trying to figure 00:15:40.400 |
out cheat sheet with regexes. Like there was so much work that was being done just to generate that. 00:15:45.200 |
Now you can just have somebody alongside with you like a copilot to help you generate that custom secret. 00:15:50.960 |
And last but not least, actually, I think this is one of the most important ones from a secret 00:15:56.880 |
scanning perspective, is to detect unstructured passwords. How often do you have password equals? 00:16:02.960 |
I mean, very, very often. Let me tell you, let me give you that answer. Very often. How often is that 00:16:09.120 |
actually vulnerable? Is that a real password? Is that actually being exploited? Probably not very often. 00:16:16.160 |
Probably way less often than how often you have password equals somewhere. But there's so many 00:16:21.760 |
types of unstructured passwords like that, where you can define a password of sorts, but never know if 00:16:27.040 |
it's actually being exploited. So what we're doing, we're doing, and think of it as an AI analysis of 00:16:31.600 |
the repository to understand if that password is a true positive. So what we're doing is finding passwords, 00:16:38.400 |
and we label them today as other because they are definitely going to have some false positive in 00:16:43.760 |
them in the first iteration of this. But it's going to identify those passwords and make it easier for 00:16:48.640 |
you to say, hey, these are actually vulnerable passwords that we have exposed in our Git history. 00:16:53.360 |
So we need to revoke them, rotate them and start storing them in our Azure core vault or Azure key 00:16:58.240 |
vault or wherever we want to. This is going to be such a game changer in terms of like passwords that are 00:17:03.920 |
internal to your company. This is going to be a game changer for passwords that aren't really very 00:17:08.240 |
structured in general that allow you to do things that you shouldn't be storing in a Git repository. 00:17:16.720 |
But this is, again, where it's a way of AI helping finding and discovering easier, much more easily than 00:17:25.680 |
you could have before. So the easier identification, faster remediation. So the easier you can identify, 00:17:32.800 |
the faster you can remediate it. But let's go into a demo. So I have about maybe like 10 minutes left 00:17:41.920 |
here. So really quickly, I kind of want to just start off with a GitHub repository here or a GitHub 00:17:46.880 |
organization here. You can see there's a security tab at the top of your GitHub organization. If you have 00:17:52.880 |
admin access or if you're there's a security manager role as well, you'll be able to see the security tab. 00:17:58.080 |
If you have GitHub advanced security on, it will actually give you a lot more information. If you 00:18:01.760 |
don't, you'll have some dependent upon information going on here. But this security tab gives you an 00:18:06.640 |
overview of all the information that you have across this organization. So in this example here, 00:18:11.680 |
what you're seeing is that there's so many open alerts. This is our demo repository. This is a production 00:18:16.640 |
code. So do not worry. This is not going to be deployed anywhere in Azure. But we are safe for 00:18:22.720 |
today. But nonetheless, 73,000 alerts is a lot. But you can identify these alerts and find more patterns 00:18:31.200 |
of what's going on based on the secrets. If it's secrets being identified or if it's vulnerabilities from 00:18:38.480 |
your SAS scanning or if it's vulnerabilities from your third party integrations or if it's dependency 00:18:42.480 |
vulnerabilities from dependabot. So you can see a lot more information and statistics on, hey, what's the 00:18:48.480 |
age of some of the alerts? These alerts have been living for a long time. What's the remediation time 00:18:53.920 |
timeline? How often are you remediating these? How often are you actually resolving these? And then you 00:18:59.680 |
can understand the impact analysis of all the different repositories. This is one of the views. 00:19:05.280 |
So my actual favorite view is the coverage view. Because the concern for a lot of customers is like, 00:19:11.440 |
how do we know what we don't know? So in many cases, people believe that they have full coverage of 00:19:18.080 |
everything. But it's usually done in CI and in the CI pipeline. So you're not actually getting coverage 00:19:23.680 |
unless you're going through a CI pipeline. But there's so many more repositories that probably are just 00:19:28.000 |
sitting there for just basic automation, something else that you're just running that it's not going 00:19:32.400 |
through your CI pipeline. So you're never actually running security scanning across all of those. 00:19:36.400 |
So in this view here, you can get a true identification of how many of your repositories are 00:19:41.920 |
actually covered. So you can see secret scanning, code scanning, and dependabot. In many cases, 00:19:47.120 |
secret scanning is an easy one-click button on. So obviously, there's just a lot more coverage across 00:19:51.840 |
secret scanning, 99% here. We're hopefully going to get 100 at one point in time. But for code scanning, 00:19:57.280 |
it's 57. What do we need to do? Why is it 57? Does it make sense that should we not be scanning more 00:20:03.200 |
things with code scanning? And that's where I think this provides really the best value. 00:20:08.000 |
That's from an organization perspective, or an admin, or if you're a security manager. 00:20:12.880 |
As a developer, I want to go into my repository, similar to your repository. You can see a security tab 00:20:18.320 |
here. This security tab really allows you to understand what's going on across that specific 00:20:22.800 |
repository. So let's go into one of these here. So I'm in my code scanning repository here, or sorry, 00:20:29.520 |
in my repository in the code scanning alerts here. I can see all of the alerts listed out where I need 00:20:35.040 |
to start fixing. So this is a lot more reactive work. We found these vulnerabilities. How can we start 00:20:40.000 |
fixing them? We have a big backlog. We have some tech debt. This is a place where I'm going to go and 00:20:44.720 |
understand what's going on. So then I can go fix those. So when I go into one of them, for instance, 00:20:49.440 |
actually, let's go into this one here. When we go into one of them, for instance, we can see 00:20:54.640 |
specifically what the vulnerability is. So if we click on show paths, you can see from source all the 00:21:00.080 |
way down to the sink of what the vulnerability is. So as a developer, I can understand that where I need 00:21:06.240 |
to start fixing these vulnerabilities. But in many cases, there's more than one way to an exploit. 00:21:12.160 |
There's more than one way to get to that exposure point. So how do we identify that and understand 00:21:17.440 |
that? That's what CodeQL does really well. It identifies all the different paths. So if I go into maybe 00:21:22.560 |
step eight, it looks like there's a different source, but it's the same sink in this example. So why not 00:21:28.400 |
actually fix a vulnerability in step seven? So then I can find the common denominator across all of 00:21:32.880 |
those. So that is the more reactive work. What we really want to do with AI is to be more proactive. 00:21:40.720 |
So now I am in a pull request. I, not me, but Mr. Left Rife left here actually introduced a vulnerability. 00:21:48.640 |
In this vulnerability, he, let's see what the vulnerability is, cross-site scripting. So he introduced 00:21:53.760 |
some cross-site scripting, easy mistake to make, very common vulnerability, 00:21:57.600 |
usually an easy fix. But as a developer, I never really knew what it was. So I can get a better 00:22:02.320 |
understanding of what that is. So dependent, advanced security will actually tell you, hey, 00:22:06.480 |
this is the vulnerability. This is the information around that vulnerability. But the autofix aspect 00:22:11.280 |
will actually be very specific. So this specific solution is to this vulnerability. So now before 00:22:17.200 |
you even merge your code into your production, main branches, develop branches, 00:22:21.680 |
you can get results and an answer back on how to resolve that vulnerability. So finding that vulnerability 00:22:28.400 |
and remediating it all within the pull request. And that's the power of AI in this case. So in this 00:22:33.520 |
example here, it's asking to install the escape HTML library and import that in and actually that resolves 00:22:39.440 |
your vulnerability fairly easily. But it could have saved me like a couple minutes, a couple hours, a 00:22:44.880 |
couple days, depending on how much I knew about this pull request or how much I knew about the code or how 00:22:49.600 |
much I understood from this vulnerability to actually make that fix. In this case, it took me just reading 00:22:55.200 |
through this. And I want to make sure I obviously I'm still the developer. I still want to do my analysis, 00:23:00.320 |
understand if I if it's the right answer. But I can then decide to commit that fix. And as soon as I commit 00:23:05.760 |
that fix, it will rerun all the scans. So we can see if that vulnerability is actually remediated right off the bat. 00:23:10.480 |
So that's our AI autofix. And I know we have only like one more minute left. But at the end of the 00:23:17.520 |
day, what we really want to show is how AI can really improve that experience. And this is just 00:23:21.920 |
one example. And the more examples are if we're generating some secrets, I can show that at the 00:23:26.640 |
Microsoft booth if you want to stop by later on. Generating the secrets with AI, detecting other types 00:23:32.240 |
of secrets with AI. I can show all of that at the Microsoft booth later on. Awesome. Thank you.