back to index

GitHub's AI Powered Security Platform: Sarah Khalife


Whisper Transcript | Transcript Only Page

00:00:00.000 | Thank you all for joining today. Thank you for staying if you stayed from the previous session
00:00:17.480 | as well. I know there were a bunch of GitHub talks, so I'm very excited that you got to hear
00:00:24.000 | from all of us from all different places within the company, but also from all different teams.
00:00:28.800 | So, you heard a lot about co-pilots, a lot, a lot about co-pilot. Co-pilot today, co-pilot
00:00:34.800 | tomorrow, co-pilot in the next year, maybe in a couple years. So, from workspaces to get
00:00:39.700 | a co-pilot enterprise to co-pilot futures, there's so much more going on with co-pilot
00:00:44.800 | itself. But what we really want to actually do within GitHub is not only incorporate some
00:00:50.160 | of the new features and capabilities within co-pilot only, but we also want to incorporate
00:00:54.400 | it as part of the platform. So, everything within the platform itself will start having
00:00:59.360 | AI in it. So, today, you've heard a lot about co-pilot. Obviously, we're at an AI conference.
00:01:03.800 | So, if you didn't, that would have been a bad thing. But with collaboration, productivity and
00:01:08.800 | security, those are some of our biggest aspects of the GitHub core platform capabilities there.
00:01:14.400 | From pull requests, as you heard some of the code review things that were happening earlier on. But also from
00:01:19.360 | productivity perspective, how do you gain momentum and get more developers excited to do their work across
00:01:24.800 | their developer platforms, especially if they're using GitHub co-pilots. How do you get that momentum going
00:01:30.800 | from a productivity perspective? And then, security. Security, I feel like, hasn't been talked about as much in terms of how can we
00:01:37.360 | how can we improve app sec with AI rather than just how do we talk about security around AI. Security around AI is
00:01:46.320 | probably something talked about all the time from legal to privacy to everything in between. But what we want to
00:01:53.280 | do at GitHub is to also incorporate AI to make sure that security is also being progressed along with the new
00:01:59.600 | capabilities that we're seeing today. And everything that GitHub does, we want to do it with scale. We want to build our
00:02:06.000 | integrations and our APIs. So anything that I've talked about today is generally API backed or there's a third party integrator that
00:02:12.320 | also comes in and helps with a lot of the capabilities. There's never one path that
00:02:16.800 | solves all the answers. So we want to make sure that all of our customers, you know, have a path away to a solution, have a
00:02:22.720 | path to that solution. Today, if you didn't know, actually, this might be not the latest numbers, actually. But I
00:02:30.880 | think the numbers are more. I haven't updated the slide in a while. But there's over 100 million
00:02:35.760 | developers registered on GitHub. There's 4 million organizations, 90% of the Fortune 100 companies,
00:02:42.880 | which sometimes I'm surprised by the numbers because I know everybody uses GitHub. But like 90% of the
00:02:48.560 | Fortune 100 is mind blowing. I work with customers day in, day out. But this is still mind blowing to see,
00:02:54.720 | because that is how we get a lot of our feedback. That's how we get a lot of our community part of the
00:02:59.760 | conversation. That's how we build a lot of our features. Anything that we do, we incorporate that community
00:03:05.040 | back into the conversation. By means of introductions, I'm Sarah Khalifa. I'm a principal
00:03:11.600 | solutions engineer here at GitHub. I've been at GitHub for about four and a half years now, which is a
00:03:17.360 | long time for GitHub's life. And it was pre-copilot, pre some of the capabilities, almost pre-actions,
00:03:26.240 | if you use actions. So, you know, we've seen the platform grow a lot during the last couple of years.
00:03:31.920 | But like I said, the community is what really makes it the biggest, better platform across the board.
00:03:37.120 | Our customers, our vendors, our partners, everybody that is part of that conversation is how we really get
00:03:42.960 | into the next iteration of what we're going to build. And I'm lucky enough to work with a lot of
00:03:48.160 | customers. So I'm working on the customer side of things. All my customers today are financial services,
00:03:52.720 | but over the last four and a half years, I've been working with all types of customers from all
00:03:57.280 | different parts of the industries outside in this enterprise world that we live in.
00:04:01.920 | So I did a quick introduction of what is the GitHub platform today. We talked about
00:04:08.800 | co-pilot, but that's only one aspect. We talked a little about collaboration. You heard about it earlier
00:04:13.360 | on through Christina and Chris's session. You also heard a lot about productivity, especially with
00:04:18.800 | copilot being in that conversation. But today we're going to talk about GitHub advanced security and
00:04:24.000 | how we're incorporating AI into that. If you haven't heard of GitHub advanced security, don't worry,
00:04:29.200 | I'm going to cover what is GitHub advanced security. And then we're going to talk about some of the new
00:04:33.600 | features are coming along that are the AI aspects into advanced security. And then we'll do a live demo,
00:04:38.240 | because as you can see, all of our GitHub teams here love doing live demos, even if they don't work,
00:04:42.640 | or if they work better than they expect, which is a very great example of what happened in the earlier session today.
00:04:48.160 | Okay. So we'll be focusing on one aspect of the platform, and then we'll be talking about
00:04:55.360 | specifically security. So how can we incorporate security in our day-to-day work? And then how can
00:04:59.760 | AI improve that experience for developers? Who here has used GitHub advanced security? Anybody?
00:05:06.880 | Nice. A couple people. So what is GitHub advanced security? So GitHub advanced security allows you to
00:05:14.160 | incorporate, think of it as your AppSec aspects into your developer platform day in, day out, and has that
00:05:21.360 | GitHub experience. So when we talk about advanced security, we have code scanning. Code scanning allows
00:05:26.320 | you to do SaaS and other types of code scanning within the developer platform, within the GitHub ecosystem,
00:05:32.080 | to find vulnerabilities and detect different patterns that are vulnerable to then help remediate them
00:05:36.720 | faster. With code scanning today, there are two aspects that I would say are the more popular aspects
00:05:42.080 | to talk about. And we can talk about a lot more if you want to come to the GitHub, Microsoft and GitHub
00:05:46.480 | booth later on. But code scanning today allows you to detect vulnerabilities using code QL, which is our
00:05:52.880 | internal or proprietary language, I guess, but it's open source. So you can actually build it,
00:05:57.840 | your own queries yourself. But with code scanning today, it allows you to detect different vulnerable
00:06:03.440 | patterns across your code. We'll find the data flow and be able to analyze that data flow to understand
00:06:08.400 | where the vulnerability is, and what type of source and exploits you have within that vulnerability.
00:06:14.000 | So that makes it super powerful, especially when we talk about some of the AI aspects that we're adding
00:06:18.240 | to it, because we have the context and we have that information in there. With code QL today,
00:06:23.680 | there are more than I don't even know how many query packs that we offer internally or from the GitHub
00:06:30.240 | side. But we also bring in that community. So anything that we do is community backed. And sorry,
00:06:34.880 | I keep hitting the mic. But anything that we do is community backed. So when we talk about code QL,
00:06:39.440 | it's not just our queries, it's Microsoft queries, it's Google's queries, it's Uber's queries, and so forth.
00:06:44.240 | And we build that aspect of community in even in security, just because we know there's never going to
00:06:49.440 | be GitHub is going to answer every single question that you may have, especially for the companies
00:06:54.160 | that you work for. The other aspect of code scanning is to incorporate third party integrations. So again,
00:07:00.000 | this is where our vendor and partnership comes through. So if you're using something like container
00:07:04.000 | scanning, code scanning today, code QL specifically is more focused on SaaS. We're not going to do
00:07:09.280 | container scanning today, at least not in the near future. So why not incorporate some of the results
00:07:14.320 | back into your pull request and get that information and feedback sooner than later. So that's an aspect
00:07:20.080 | of third party integrations, code scanning, infrastructures, code scanning, sorry, container
00:07:25.360 | scanning, infrastructures, code scanning, or any other third parties that you want to integrate with,
00:07:29.120 | you can do that through serif inputs through code scanning. The biggest win, which all my customers have
00:07:35.440 | loved, and I don't know if you felt it if you're doing even open source work is secret scanning. Secret
00:07:41.360 | scanning is a lifesaver in many, many cases. How many times has somebody accidentally left some,
00:07:48.400 | you know, secrets, maybe AWS or Azure keys in there in their log file without realizing and, you know,
00:07:56.640 | it happens. You do a test file and you forget to add it to your git ignore. It saves, it saves a lot of
00:08:03.600 | Bitcoin miners being spun up within your environment. So secret scanning does a full-blown analysis of secrets
00:08:11.040 | across your repositories from API keys to your own custom secrets and using AI that I'll be talking
00:08:17.680 | about in a little bit on how to detect other types of secrets that are just plain text but really hard
00:08:23.120 | to reduce the amount of false positives on them. AI can really help with that. So with secret scanning,
00:08:28.400 | it's been the biggest win across all of my customers and it's been a big, big discussion on how do we
00:08:33.200 | prevent things but also how do we reactively and proactively prevent things. So proactively what we
00:08:40.080 | have incorporated was push protection so it allows you to block any pushes that are coming to the GitHub
00:08:45.600 | ecosystem before the secret is being exposed or before it goes into your git commit history so then you can,
00:08:51.440 | you don't have to revoke it at that point. But what's in your git commit history? We don't recommend
00:08:55.840 | deleting anything in your git commit history especially if you work for a regulated industry
00:09:00.080 | that goes through audits and has to maintain a lot of historical aspects very precisely. That's where
00:09:06.800 | we just recommend, hey, we are able to detect not only your current state but all your git history
00:09:11.920 | and other issues, pull requests, comments and pull requests and so forth if there's any secrets in
00:09:16.720 | those and we really recommend revoking them. And last but not least, supply chain security. If you
00:09:24.320 | have maybe heard of it as Dependabot is one of the tools within the supply chain security aspect of it,
00:09:30.400 | Dependabot allows you to detect dependencies that are vulnerable today. So with Dependabot, it gives you
00:09:36.160 | an opportunity to say, hey, I found vulnerabilities for these dependents or found that these dependencies are
00:09:41.360 | vulnerable. So maybe we need to upgrade to the latest version. There's going to be some additional AI
00:09:45.840 | components are being added to that as well. But that's something that I won't be covering as much
00:09:49.280 | today because that's still early earlier on in the stages there. Some of our secret scanning partners
00:09:56.240 | that we work with today are very much common vendors that you might be working with throughout the
00:10:02.240 | your day to day. But this is where again, we talk about the community and the vendors and the partners
00:10:07.040 | that we work with. Because what we do is for secret scanning is not only
00:10:10.800 | incorporate their patterns, but we also push them to improve how they're doing the patterns to make
00:10:16.240 | sure that their vulnerability, their secrets are in general, are not going to create a lot of false
00:10:22.160 | positives. So we create a kind of like a mechanism for them to add hashes and more kind of more specific
00:10:28.240 | information to be able to detect these secrets almost at 99%, 99.9 something percent. I don't know the
00:10:34.880 | exact average that we have today, but it's pretty high up there and it reduces the amount of false
00:10:39.280 | positives so significantly when you use some of our high fidelity partners that we work with.
00:10:45.280 | So, at any given point in time, GitHub really believes security should be part of the day-to-day
00:10:53.840 | responsibilities of everybody. It's a shared responsibility. It's never just AppSec saying,
00:10:58.720 | hey, you need to fix these 10 vulnerabilities by tomorrow or else we can't deploy. It's never just
00:11:05.360 | the developer trying to figure out how to fix this vulnerability that they've never even heard of or
00:11:10.240 | maybe not even understand to then be able to deploy on time. So it should be more of a shared
00:11:16.160 | responsibility. So our goal is to bridge that gap and make that conversation a lot easier. So anything that
00:11:20.640 | we do with advanced security, anything that we're doing with AI allows you to really add that aspect
00:11:25.280 | into it. So what can AI do for us? How can we benefit with, how can AI benefit security? With AI, there's so
00:11:36.320 | much more that you can do, especially with generative AI, as you can see with co-pilot, with all the new
00:11:41.280 | customers, vendors, partners that you're seeing here at this conference. There's just so many aspects to
00:11:46.240 | it. So the first couple of things that we've noticed right off the bat is easier identification.
00:11:51.360 | How can we help, how can AI help us identify vulnerabilities or secrets much easier? How can we
00:11:58.160 | have faster remediation? So when you identify things, if you're not fixing them, then what's the point of
00:12:03.280 | identifying them half the time, right? If you're not going to fix the vulnerabilities, that's where the
00:12:07.760 | actual issue is. It's easy to find vulnerabilities lately, a lot more easier than they were before,
00:12:14.160 | but fixing them is the actual issue. And that's where the productivity aspect also comes into play.
00:12:19.360 | And last but not least, driving that productivity. The faster you're able to fix vulnerabilities, the
00:12:23.920 | faster you're able to be a little more productive, increase your like security risk postures of where
00:12:29.440 | your company may be today, reduce the amount of, you know, turmoil that you have to hit, you know,
00:12:35.680 | by deploying earlier on with the fix rather than waiting to like production or after production,
00:12:40.720 | or when a customer is using your product already. But in general, this is where AI, we see AI really
00:12:46.160 | helping introduce a lot more of that, more of those capabilities. So first, but not the most important,
00:12:55.520 | but it is probably the biggest one that we are very excited for is code scanning autofix. With code
00:13:02.000 | scanning autofix, not only are we helping detect vulnerabilities with code scanning,
00:13:07.040 | but now we're providing a way to autofix those with AI. So in the pull request, as you're working
00:13:13.520 | actively, it will actually provide a response back to say, hey, maybe you should be fixing this
00:13:19.040 | vulnerability this specific way. And it'll give you a suggestion. Obviously, it's AI. So it's going to give
00:13:23.840 | you a suggestion of what it thinks based on the context it has, you can always edit it, you can always fix it,
00:13:28.800 | or you can commit it and rerun your test and see if it actually fixes that vulnerability.
00:13:32.800 | With code scanning autofix, code QL is what's providing a lot of that context. So code QL is
00:13:39.120 | finding the data flow of that vulnerability, you're getting information of what that vulnerability common
00:13:44.400 | fixes are when you're doing code scanning. So providing the context in the way that we are doing
00:13:48.960 | our backend system to prompt that request, it's actually providing a really, really good autofix
00:13:54.960 | result. From our customers and from all of my customers that have tested this today, they found
00:14:00.000 | that autofix has been pretty successful for, I don't know, maybe 70% of their use cases. But again, this is
00:14:06.800 | going to only get better as we're working with more customers as more people are starting to test this
00:14:11.760 | out. And it's in public beta today, so you can actually test this out yourself if you're interested.
00:14:15.600 | Second to this is the secret scanning improvements. Some of the aspects of creating a custom pattern
00:14:24.000 | requires a lot of work. I mean, I don't know, who knows regex to the point where they can, they feel
00:14:30.880 | confident, confident rolling out a regex scan across all your repositories, right? I personally cannot
00:14:39.680 | claim that I do. My regex, I mean, it's not bad, but it's not the best that it can be. So why not
00:14:45.600 | have AI help us custom generate those regexes? So with custom pattern generation, you can provide AI
00:14:51.760 | capabilities to maybe suggest a different way to write some of your regexes. So you can provide samples
00:14:58.320 | and examples of what you're looking for. And then AI, our secret can or secret scanning custom pattern
00:15:04.080 | generation would generate a custom pattern for you to at least have a starting point if you don't think
00:15:09.280 | that's the full answer just yet. But it generates that custom pattern. And it makes it so much easier
00:15:14.480 | to give more and more examples because the more context it has, the better answer will provide.
00:15:18.720 | And it will generate a response back so you don't have to figure out how to
00:15:22.320 | write this new on search for this type of regex to find a custom pattern.
00:15:25.760 | So this has simplified the process so much. And a lot of our customers have loved, loved,
00:15:31.200 | loved having this capability because they were doing that anyway, probably on chat GPT,
00:15:35.920 | or maybe going to copilot in their IDE, or maybe they were doing this on Google and trying to figure
00:15:40.400 | out cheat sheet with regexes. Like there was so much work that was being done just to generate that.
00:15:45.200 | Now you can just have somebody alongside with you like a copilot to help you generate that custom secret.
00:15:50.960 | And last but not least, actually, I think this is one of the most important ones from a secret
00:15:56.880 | scanning perspective, is to detect unstructured passwords. How often do you have password equals?
00:16:02.960 | I mean, very, very often. Let me tell you, let me give you that answer. Very often. How often is that
00:16:09.120 | actually vulnerable? Is that a real password? Is that actually being exploited? Probably not very often.
00:16:16.160 | Probably way less often than how often you have password equals somewhere. But there's so many
00:16:21.760 | types of unstructured passwords like that, where you can define a password of sorts, but never know if
00:16:27.040 | it's actually being exploited. So what we're doing, we're doing, and think of it as an AI analysis of
00:16:31.600 | the repository to understand if that password is a true positive. So what we're doing is finding passwords,
00:16:38.400 | and we label them today as other because they are definitely going to have some false positive in
00:16:43.760 | them in the first iteration of this. But it's going to identify those passwords and make it easier for
00:16:48.640 | you to say, hey, these are actually vulnerable passwords that we have exposed in our Git history.
00:16:53.360 | So we need to revoke them, rotate them and start storing them in our Azure core vault or Azure key
00:16:58.240 | vault or wherever we want to. This is going to be such a game changer in terms of like passwords that are
00:17:03.920 | internal to your company. This is going to be a game changer for passwords that aren't really very
00:17:08.240 | structured in general that allow you to do things that you shouldn't be storing in a Git repository.
00:17:16.720 | But this is, again, where it's a way of AI helping finding and discovering easier, much more easily than
00:17:25.680 | you could have before. So the easier identification, faster remediation. So the easier you can identify,
00:17:32.800 | the faster you can remediate it. But let's go into a demo. So I have about maybe like 10 minutes left
00:17:41.920 | here. So really quickly, I kind of want to just start off with a GitHub repository here or a GitHub
00:17:46.880 | organization here. You can see there's a security tab at the top of your GitHub organization. If you have
00:17:52.880 | admin access or if you're there's a security manager role as well, you'll be able to see the security tab.
00:17:58.080 | If you have GitHub advanced security on, it will actually give you a lot more information. If you
00:18:01.760 | don't, you'll have some dependent upon information going on here. But this security tab gives you an
00:18:06.640 | overview of all the information that you have across this organization. So in this example here,
00:18:11.680 | what you're seeing is that there's so many open alerts. This is our demo repository. This is a production
00:18:16.640 | code. So do not worry. This is not going to be deployed anywhere in Azure. But we are safe for
00:18:22.720 | today. But nonetheless, 73,000 alerts is a lot. But you can identify these alerts and find more patterns
00:18:31.200 | of what's going on based on the secrets. If it's secrets being identified or if it's vulnerabilities from
00:18:38.480 | your SAS scanning or if it's vulnerabilities from your third party integrations or if it's dependency
00:18:42.480 | vulnerabilities from dependabot. So you can see a lot more information and statistics on, hey, what's the
00:18:48.480 | age of some of the alerts? These alerts have been living for a long time. What's the remediation time
00:18:53.920 | timeline? How often are you remediating these? How often are you actually resolving these? And then you
00:18:59.680 | can understand the impact analysis of all the different repositories. This is one of the views.
00:19:05.280 | So my actual favorite view is the coverage view. Because the concern for a lot of customers is like,
00:19:11.440 | how do we know what we don't know? So in many cases, people believe that they have full coverage of
00:19:18.080 | everything. But it's usually done in CI and in the CI pipeline. So you're not actually getting coverage
00:19:23.680 | unless you're going through a CI pipeline. But there's so many more repositories that probably are just
00:19:28.000 | sitting there for just basic automation, something else that you're just running that it's not going
00:19:32.400 | through your CI pipeline. So you're never actually running security scanning across all of those.
00:19:36.400 | So in this view here, you can get a true identification of how many of your repositories are
00:19:41.920 | actually covered. So you can see secret scanning, code scanning, and dependabot. In many cases,
00:19:47.120 | secret scanning is an easy one-click button on. So obviously, there's just a lot more coverage across
00:19:51.840 | secret scanning, 99% here. We're hopefully going to get 100 at one point in time. But for code scanning,
00:19:57.280 | it's 57. What do we need to do? Why is it 57? Does it make sense that should we not be scanning more
00:20:03.200 | things with code scanning? And that's where I think this provides really the best value.
00:20:08.000 | That's from an organization perspective, or an admin, or if you're a security manager.
00:20:12.880 | As a developer, I want to go into my repository, similar to your repository. You can see a security tab
00:20:18.320 | here. This security tab really allows you to understand what's going on across that specific
00:20:22.800 | repository. So let's go into one of these here. So I'm in my code scanning repository here, or sorry,
00:20:29.520 | in my repository in the code scanning alerts here. I can see all of the alerts listed out where I need
00:20:35.040 | to start fixing. So this is a lot more reactive work. We found these vulnerabilities. How can we start
00:20:40.000 | fixing them? We have a big backlog. We have some tech debt. This is a place where I'm going to go and
00:20:44.720 | understand what's going on. So then I can go fix those. So when I go into one of them, for instance,
00:20:49.440 | actually, let's go into this one here. When we go into one of them, for instance, we can see
00:20:54.640 | specifically what the vulnerability is. So if we click on show paths, you can see from source all the
00:21:00.080 | way down to the sink of what the vulnerability is. So as a developer, I can understand that where I need
00:21:06.240 | to start fixing these vulnerabilities. But in many cases, there's more than one way to an exploit.
00:21:12.160 | There's more than one way to get to that exposure point. So how do we identify that and understand
00:21:17.440 | that? That's what CodeQL does really well. It identifies all the different paths. So if I go into maybe
00:21:22.560 | step eight, it looks like there's a different source, but it's the same sink in this example. So why not
00:21:28.400 | actually fix a vulnerability in step seven? So then I can find the common denominator across all of
00:21:32.880 | those. So that is the more reactive work. What we really want to do with AI is to be more proactive.
00:21:40.720 | So now I am in a pull request. I, not me, but Mr. Left Rife left here actually introduced a vulnerability.
00:21:48.640 | In this vulnerability, he, let's see what the vulnerability is, cross-site scripting. So he introduced
00:21:53.760 | some cross-site scripting, easy mistake to make, very common vulnerability,
00:21:57.600 | usually an easy fix. But as a developer, I never really knew what it was. So I can get a better
00:22:02.320 | understanding of what that is. So dependent, advanced security will actually tell you, hey,
00:22:06.480 | this is the vulnerability. This is the information around that vulnerability. But the autofix aspect
00:22:11.280 | will actually be very specific. So this specific solution is to this vulnerability. So now before
00:22:17.200 | you even merge your code into your production, main branches, develop branches,
00:22:21.680 | you can get results and an answer back on how to resolve that vulnerability. So finding that vulnerability
00:22:28.400 | and remediating it all within the pull request. And that's the power of AI in this case. So in this
00:22:33.520 | example here, it's asking to install the escape HTML library and import that in and actually that resolves
00:22:39.440 | your vulnerability fairly easily. But it could have saved me like a couple minutes, a couple hours, a
00:22:44.880 | couple days, depending on how much I knew about this pull request or how much I knew about the code or how
00:22:49.600 | much I understood from this vulnerability to actually make that fix. In this case, it took me just reading
00:22:55.200 | through this. And I want to make sure I obviously I'm still the developer. I still want to do my analysis,
00:23:00.320 | understand if I if it's the right answer. But I can then decide to commit that fix. And as soon as I commit
00:23:05.760 | that fix, it will rerun all the scans. So we can see if that vulnerability is actually remediated right off the bat.
00:23:10.480 | So that's our AI autofix. And I know we have only like one more minute left. But at the end of the
00:23:17.520 | day, what we really want to show is how AI can really improve that experience. And this is just
00:23:21.920 | one example. And the more examples are if we're generating some secrets, I can show that at the
00:23:26.640 | Microsoft booth if you want to stop by later on. Generating the secrets with AI, detecting other types
00:23:32.240 | of secrets with AI. I can show all of that at the Microsoft booth later on. Awesome. Thank you.
00:23:40.320 | We'll see you next time.