back to indexIan Goodfellow: Generative Adversarial Networks (GANs) | Lex Fridman Podcast #19
Chapters
0:0 Introduction
1:8 Deep learning limitations
2:42 Function estimators
6:53 Selfawareness
8:58 Difficult cases
12:50 Hidden voice commands
14:0 Writing a deep learning chapter
16:44 What is deep learning
18:28 What is an example of deep learning
20:36 What could an alternative direction of training neural networks look like
21:43 Are you optimistic about us discovering something better
24:17 How do we build knowledge representation
25:17 Differentiable knowledge bases
26:40 GANs at a bar
27:54 Deep Boltzmann machines
30:20 What are GANs
33:26 How do GANs work
36:44 Types of GANs
39:51 History of GANs
43:31 Semisupervised GANs
44:26 Class labels
46:22 Zebra cycle
49:31 Data augmentation
52:10 Fairness
00:00:00.000 |
The following is a conversation with Ian Goodfellow. 00:00:03.720 |
He's the author of the popular textbook on deep learning, 00:00:08.920 |
He coined the term of generative adversarial networks, 00:00:37.600 |
and now at Apple as the director of machine learning. 00:00:41.560 |
This recording happened while Ian was still at Google Brain, 00:00:45.400 |
but we don't talk about anything specific to Google 00:00:54.520 |
If you enjoy it, subscribe on YouTube, iTunes, 00:00:57.560 |
or simply connect with me on Twitter @LexFriedman, 00:01:03.000 |
And now, here's my conversation with Ian Goodfellow. 00:01:17.120 |
which in turn is a subset of machine learning, 00:01:22.520 |
So this kind of implies that there may be limits 00:01:27.720 |
So what do you think is the current limits of deep learning, 00:01:35.760 |
- Yeah, I think one of the biggest limitations 00:01:39.320 |
it requires really a lot of data, especially labeled data. 00:01:47.140 |
that can reduce the amount of labeled data you need, 00:01:49.480 |
but they still require a lot of unlabeled data. 00:01:52.200 |
Reinforcement learning algorithms, they don't need labels, 00:02:01.600 |
So just getting the generalization ability better 00:02:25.560 |
You use deep learning as sub-modules of other systems, 00:02:42.520 |
- So you're basically building a function estimator. 00:02:50.180 |
about this so far, but do you think neural networks 00:02:52.280 |
could be made to reason in the way symbolic systems did 00:02:58.800 |
create more like programs as opposed to functions? 00:03:01.480 |
- Yeah, I think we already see that a little bit. 00:03:03.980 |
I already kind of think of neural nets as a kind of program. 00:03:08.880 |
I think of deep learning as basically learning programs 00:03:23.540 |
as describing the number of steps that run in sequence, 00:03:43.740 |
You could have a lot of input features to the model, 00:03:45.660 |
and you could multiply each feature by a different weight. 00:03:48.140 |
All those multiplications were done in parallel 00:03:54.360 |
was really the ability to have steps of a program 00:04:00.340 |
And I think that we've actually started to see 00:04:05.020 |
is more the fact that we have a multi-step program 00:04:07.980 |
rather than the fact that we've learned a representation. 00:04:10.780 |
If you look at things like ResNets, for example, 00:04:15.140 |
they take one particular kind of representation 00:04:21.060 |
Back when deep learning first really took off 00:04:40.420 |
and eventually you get these kind of grandmother cell units 00:04:51.980 |
you can do more updates before you output your final number. 00:04:56.420 |
that layer 150 of the ResNet is a grandmother cell, 00:05:01.420 |
and layer 100 is contours or something like that. 00:05:08.180 |
as a singular representation that keeps building. 00:05:21.500 |
and arrives at better and better understandings, 00:05:23.820 |
but it's not replacing the representation at each step. 00:05:29.160 |
And in some sense, that's a little bit like reasoning. 00:05:33.560 |
but it's reasoning in the form of taking a thought 00:05:41.260 |
- So do you think, and I hope you don't mind, 00:05:43.580 |
we'll jump philosophical every once in a while. 00:05:53.500 |
of this kind of sequential representation learning, 00:06:06.440 |
I guess there's, consciousness is often defined 00:06:12.060 |
and that's relatively easy to turn into something actionable 00:06:19.700 |
in terms of having qualitative states of experience, 00:06:22.420 |
like qualia, and there's all these philosophical problems, 00:06:27.820 |
who does all the same information processing as a human, 00:06:30.700 |
but doesn't really have the qualitative experiences 00:06:34.660 |
That sort of thing, I have no idea how to formalize 00:06:44.860 |
And similarly, I don't know how you could run an experiment 00:06:49.640 |
had become conscious in the sense of qualia or not. 00:06:58.900 |
in an impressive way, emerge from current types 00:07:03.220 |
of architectures that we think of as deep learning. 00:07:07.940 |
in terms of self-awareness and just making plans 00:07:12.180 |
based on the fact that the agent itself exists in the world, 00:07:20.140 |
to model the agent's effect on the environment. 00:07:23.060 |
So that more limited version of consciousness 00:07:26.340 |
is already something that we get limited versions of 00:07:52.500 |
if we get much better on supervised learning, 00:08:00.620 |
do you think we'll start to see really impressive things 00:08:16.420 |
I do think it'll be important to get the right kind of data. 00:08:20.100 |
Today, most of the machine learning systems we train 00:08:23.140 |
are mostly trained on one type of data for each model. 00:08:27.540 |
But the human brain, we get all of our different senses 00:08:37.940 |
I think when you get that kind of integrated data set 00:08:44.420 |
that can actually close the loop and interact, 00:08:50.460 |
from what we have today learn really interesting things 00:08:54.380 |
and train them on a large amount of multimodal data. 00:08:59.620 |
but within, like you're working adversarial examples, 00:09:04.020 |
so selecting within modal, within one mode of data, 00:09:09.020 |
selecting better what are the difficult cases 00:09:16.140 |
- Oh yeah, like could we get a whole lot of mileage 00:09:22.260 |
to adversarial examples or something like that? 00:09:32.740 |
I was thinking of it mostly as adversarial examples 00:09:43.700 |
respond to adversarial examples and how humans respond. 00:09:49.180 |
I still think that adversarial examples are important. 00:09:51.940 |
I think of them now more of as a security liability 00:09:57.780 |
there's something uniquely wrong with machine learning 00:10:06.460 |
Not on the security side, but literally just accuracy. 00:10:10.780 |
- I do see them as a kind of tool on that side, 00:10:13.460 |
but maybe not quite as much as I used to think. 00:10:16.660 |
We've started to find that there's a trade-off 00:10:29.060 |
that showed resistance to some kinds of adversarial examples, 00:10:33.020 |
it also got better at the clean data on MNIST. 00:10:39.020 |
that when we train against weak adversarial examples, 00:10:43.900 |
So far, that hasn't really held up on other data sets 00:11:00.540 |
'cause it feels like that's how us humans learn 00:11:11.020 |
It's also, in a lot of branches of engineering, 00:11:15.820 |
and make sure that your system will work in the worst case. 00:11:23.580 |
that happen when you go out into a really randomized world. 00:11:27.420 |
- Yeah, with driving with autonomous vehicles, 00:11:36.900 |
And if you can be robust to all those difficult cases, 00:11:49.100 |
isn't really focused on a particular use case, 00:11:54.020 |
where you'd like to make sure that the adversary 00:11:56.940 |
can't interfere with the operation of your system. 00:12:01.060 |
if you have an algorithm making trades for you, 00:12:17.140 |
because you don't want people to make adversarial examples 00:12:19.500 |
that fool your algorithm into making bad trades. 00:12:26.580 |
in the academic literature is speech recognition. 00:12:30.180 |
If you use speech recognition to hear an audio waveform 00:12:47.820 |
doesn't realize that something like that is happening. 00:13:10.780 |
they could make sounds that are not understandable 00:13:13.780 |
by a human, but are recognized as the target phrase 00:13:18.420 |
that the attacker wants the phone to recognize it as. 00:13:21.340 |
Since then, things have gotten a little bit better 00:13:24.020 |
on the attacker side and worse on the defender side. 00:13:35.580 |
but are actually interpreted as a different sentence 00:13:42.740 |
of the adversarial perturbation is still kind of high. 00:13:48.180 |
it sounds like there's some noise in the background, 00:13:55.540 |
that makes the phone hear a completely different sentence. 00:14:01.620 |
the deep learning chapter for the fourth edition 00:14:04.260 |
of the "Artificial Intelligence, a Modern Approach" book. 00:14:19.180 |
Is it, even having written a full length textbook before, 00:14:22.660 |
it's still pretty intimidating to try to start writing 00:14:42.300 |
that were maybe extraneous in the first book. 00:14:49.420 |
and what seems a little bit less important to have included 00:15:00.580 |
to the point where some core ideas from the 1980s 00:15:04.780 |
When I first started studying machine learning, 00:15:06.660 |
almost everything from the 1980s had been rejected 00:15:11.340 |
So that stuff that's really stood the test of time 00:15:15.940 |
There's also, I guess, two different philosophies 00:15:23.140 |
One philosophy is you try to write a reference 00:15:32.420 |
and tells them what the most important concepts are. 00:15:45.780 |
Writing this chapter for Russell and Norvig's book, 00:15:48.940 |
I was able to focus more on just a concise introduction 00:15:55.980 |
In a lot of cases, I actually just wrote paragraphs 00:16:01.900 |
"It's pointless to try to tell you what the latest 00:16:04.660 |
"and best version of a learn-to-learn model is." 00:16:09.660 |
I can point you to a paper that's recent right now, 00:16:24.980 |
You should know that learning-to-learn is a thing 00:16:32.220 |
or recurrent net module that you would want to use 00:16:36.060 |
But there isn't a lot of point in trying to summarize 00:16:38.180 |
exactly which architecture and which learning approach 00:16:58.700 |
algorithms and data structures algorithms course. 00:17:03.740 |
I remember the professor asked, "What is an algorithm?" 00:17:14.100 |
Everybody knew what the algorithm, it was a graduate course. 00:17:23.620 |
- I would say deep learning is any kind of machine learning 00:17:36.020 |
So that would mean shallow learning is things 00:17:39.620 |
where you learn a lot of operations that happen in parallel. 00:17:43.780 |
You might have a system that makes multiple steps, 00:17:46.740 |
like you might have hand-designed feature extractors, 00:17:52.660 |
Deep learning is anything where you have multiple operations 00:17:59.820 |
like convolutional networks and recurrent networks, 00:18:10.900 |
Today I hear a lot of people define deep learning 00:18:21.500 |
And I think that's a legitimate usage of the term. 00:18:31.780 |
that is not gradient descent and differentiable functions? 00:18:39.820 |
what's your thought about that space of approaches? 00:18:44.340 |
- Yeah, so I tend to think of machine learning algorithms 00:18:46.380 |
as decomposed into really three different pieces. 00:18:50.220 |
There's the model, which can be something like a neural net 00:18:56.620 |
And that basically just describes how do you take data 00:19:01.140 |
And what function do you use to make a prediction 00:19:12.380 |
or not every algorithm can be really described 00:19:15.900 |
but what's the algorithm for updating the parameters 00:19:18.860 |
or updating whatever the state of the network is? 00:19:29.180 |
as it comes into your machine learning system? 00:19:32.100 |
So I think of deep learning as telling us something 00:19:41.260 |
I say that it just has to have multiple layers. 00:19:46.340 |
in a feed-forward differentiable computation. 00:19:49.220 |
That can be multiple layers in a graphical model. 00:19:52.020 |
There's a lot of ways that you could satisfy me 00:20:01.900 |
how do you actually update the parameters piece? 00:20:07.540 |
and training it with something like evolution 00:20:11.300 |
And I would say that still qualifies as deep learning. 00:20:35.820 |
- So it's the steps of processing that's key. 00:20:38.980 |
So Jeff Hinton suggests that we need to throw away 00:20:59.220 |
isn't on the critical path to research for improving AI, 00:21:04.660 |
It just becomes used for some specialized set of things. 00:21:14.020 |
who are working on things like speech recognition 00:21:18.460 |
But there's still a lot of use for logistic regression 00:21:30.740 |
So I think back propagation and gradient descent 00:21:33.500 |
are around to stay, but they may not end up being 00:21:37.500 |
everything that we need to get to real human level 00:21:44.780 |
back propagation has been around for a few decades. 00:21:50.260 |
So are you optimistic about us as a community 00:21:57.660 |
I think we likely will find something that works better. 00:22:01.820 |
You could imagine things like having stacks of models 00:22:07.580 |
predict parameters of the higher level models. 00:22:14.460 |
but just predicting how different values will perform. 00:22:17.700 |
You can kind of see that already in some areas 00:22:27.700 |
for things like hyper parameter optimization. 00:22:41.180 |
and having it really advance the state of the art 00:23:08.460 |
working quite right yet is like short-term memory. 00:23:21.820 |
Like gradient descent to learn a specific fact 00:23:29.420 |
Like if I tell you the meeting today is at 3 p.m., 00:23:35.500 |
it's at 3 p.m., it's at 3 p.m., it's at 3 p.m., 00:23:37.820 |
it's at 3 p.m. for you to do a gradient step on each one. 00:23:52.220 |
and update themselves with facts like that right away. 00:23:54.900 |
But I don't think we've really nailed it yet. 00:24:08.820 |
updating the state of a machine learning system 00:24:16.980 |
- So some of the success of symbolic systems in the '80s 00:24:21.420 |
is they were able to assemble these kinds of facts better. 00:24:33.700 |
as something that we'll have to return to eventually, 00:24:51.180 |
which has mostly been machine learning security 00:24:56.740 |
I haven't usually found myself moving in that direction. 00:25:00.540 |
For generative models, I could see a little bit of, 00:25:16.860 |
- I mean, neural network is kind of like that. 00:25:19.020 |
It's a differentiable knowledge base of sorts. 00:25:23.620 |
- If we had a really easy way of giving feedback 00:25:29.260 |
that would clearly help a lot with generative models. 00:25:32.380 |
And so you could imagine one way of getting there 00:25:33.900 |
would be get a lot better at natural language processing. 00:25:44.060 |
- Being able to have a chat with a neural network. 00:25:47.860 |
So like one thing in generative models we see a lot today 00:25:49.980 |
is you'll get things like faces that are not symmetrical, 00:25:53.540 |
like people that have two eyes that are different colors. 00:26:00.820 |
but not nearly as many of them as you tend to see 00:26:10.180 |
people's faces are generally approximately symmetric 00:26:30.140 |
without bringing back some of the 1980s technology, 00:26:32.180 |
but I also see some ways that you could imagine 00:26:49.580 |
GANs would work, generative adversarial networks, 00:27:09.300 |
What was the basis of your intuition why it should work? 00:27:15.980 |
promoting alcohol for the purposes of science, 00:27:32.460 |
that I'm less prone to shooting down some of my own ideas 00:27:49.820 |
was that trying to train two neural nets at the same time 00:28:03.180 |
would not be able to generate anything reasonable, 00:28:08.260 |
- Yeah, so part of what all of us were thinking about 00:28:11.360 |
when we had this conversation was deep Bolton machines, 00:28:16.980 |
were a big fan of deep Bolton machines at the time. 00:28:31.180 |
and tell the model to make the data more likely. 00:28:37.020 |
and tell the model to make those samples less likely. 00:28:43.960 |
You have to actually run an iterative process 00:28:53.900 |
you're always running these two systems at the same time. 00:28:57.180 |
One that's updating the parameters of the model 00:28:58.940 |
and another one that's trying to generate samples 00:29:01.680 |
And they worked really well on things like MNIST, 00:29:07.500 |
to scale past MNIST to things like generating color photos. 00:29:18.740 |
a lot of people thought that the discriminator 00:29:25.340 |
That trying to train the discriminator in the inner loop, 00:29:41.940 |
- A lot of the time with machine learning algorithms, 00:29:46.900 |
You have to just run the experiment and see what happens. 00:29:49.140 |
And I would say I still today don't have one factor 00:29:54.740 |
"This is why GANs worked for photo generation 00:30:03.300 |
showing that under some theoretical settings, 00:30:14.140 |
that they don't necessarily explain the whole picture 00:30:17.540 |
in terms of all the results that we see in practice. 00:30:22.300 |
can you, in the same way as we talked about deep learning, 00:30:24.860 |
can you tell me what generative adversarial networks are? 00:30:33.980 |
A generative model is a machine learning model 00:30:38.860 |
Like say you have a collection of photos of cats 00:30:41.220 |
and you want to generate more photos of cats, 00:30:43.980 |
or you want to estimate a probability distribution over cats 00:30:55.800 |
Some generative models are good at creating new data. 00:30:59.180 |
Other generative models are good at estimating 00:31:06.580 |
to come from the same distribution as the training data. 00:31:15.620 |
There are some kinds of GANs like FlowGAN that can do both, 00:31:18.500 |
but mostly GANs are about generating samples, 00:31:21.620 |
generating new photos of cats that look realistic. 00:31:41.020 |
It isn't doing something like compositing photos together. 00:31:44.540 |
You're not literally taking the eye off of one cat 00:32:01.980 |
What's specific to GANs is that we have a two-player game 00:32:10.340 |
one of them becomes able to generate realistic data. 00:32:16.140 |
It produces output data, such as just images, for example. 00:32:25.140 |
The other player is called the discriminator. 00:32:50.980 |
at recognizing whether images are real or fake. 00:32:59.620 |
And you can analyze this through the language of game theory 00:33:18.740 |
because all the samples coming from both the data 00:33:28.380 |
and does it just blow your mind that this thing works? 00:33:33.380 |
so it's able to estimate the identity function 00:33:44.220 |
how does this even, why, this is quite incredible, 00:33:55.460 |
that if they really did what we asked them to do, 00:33:58.860 |
they would do nothing but memorize the training data. 00:34:01.940 |
- Models that are based on maximizing the likelihood, 00:34:05.780 |
the way that you obtain the maximum likelihood 00:34:15.140 |
For GANs, the game is played using a training set. 00:34:18.420 |
So the way that you become unbeatable in the game 00:34:33.060 |
for the generator to memorize the training data, 00:34:42.180 |
for why it would require quite a lot of learning steps 00:34:47.180 |
and a lot of observations of different latent variables 00:35:03.740 |
And I don't think we really have a good answer for that, 00:35:10.260 |
and how few images the generative model sees during training. 00:35:22.740 |
training them to memorize rather than generalize. 00:35:30.860 |
where they show that you can take a convolutional net 00:35:33.100 |
and you don't even need to learn the parameters of it at all, 00:35:37.700 |
And it's already useful for things like in-painting images. 00:35:54.060 |
That would imply that it would be much harder 00:36:01.300 |
So far, we're able to make reasonable speech models 00:36:11.500 |
see a lot of deep learning models of biology data sets, 00:36:26.900 |
turns out to really rely heavily on the model architecture. 00:36:30.140 |
And we were able to do what we did for vision 00:36:33.020 |
by trying to reverse engineer the human visual system. 00:36:39.820 |
use that same trick for arbitrary kinds of data. 00:36:42.580 |
- Right, so there's aspects of the human vision system, 00:36:51.140 |
just makes it really effective at detecting the patterns 00:37:06.300 |
and what other generative models besides GANs are there? 00:37:10.100 |
- Yeah, so it's maybe a little bit easier to start with 00:37:13.540 |
what kinds of generative models are there other than GANs. 00:37:16.900 |
So most generative models are likelihood-based, 00:37:20.900 |
where to train them, you have a model that tells you 00:37:24.900 |
how much probability it assigns to a particular example, 00:37:33.700 |
It turns out that it's hard to design a model 00:37:46.220 |
the likelihood function from a computational point of view. 00:37:53.820 |
write down intuitively, it turns out that it's almost 00:37:56.420 |
impossible to calculate the amount of probability 00:38:00.780 |
So there's a few different schools of generative models 00:38:06.260 |
One approach is to very carefully design the model 00:38:12.780 |
to measure the density it assigns to a particular point. 00:38:15.540 |
So there are things like autoregressive models, 00:38:28.660 |
So for an image, you estimate the probability 00:38:31.540 |
of each pixel, given all of the pixels that came before it. 00:38:37.660 |
the density function, you can actually calculate 00:38:40.620 |
the density for all these pixels more or less in parallel. 00:38:43.500 |
Generating the image still tends to require you 00:38:46.860 |
to go one pixel at a time, and that can be very slow. 00:39:07.460 |
are from GANs these days, but it can be hard to tell 00:39:14.700 |
which type of algorithm, if that makes sense. 00:39:17.300 |
- The amount of effort invested in a particular-- 00:39:21.420 |
So a lot of people who've traditionally been excited 00:39:28.740 |
are GANs doing better because they have a lot of 00:39:38.900 |
or are GANs doing better because they prioritize 00:39:45.500 |
I think all of those are potentially valid explanations, 00:40:00.980 |
In the first paper, we just showed that GANs basically work. 00:40:15.020 |
- We used MNIST, which is little handwritten digits. 00:40:32.980 |
which is things like very small 32 by 32 pixels 00:40:40.660 |
For that, we didn't get recognizable objects, 00:40:46.180 |
were really used to looking at these failed samples 00:40:50.420 |
And people who are used to reading the tea leaves 00:40:53.020 |
recognize that our tea leaves at least look different. 00:41:06.180 |
by Emily Denton and Sumit Chintala at Facebook AI Research, 00:41:10.900 |
where they actually got really good high-resolution photos 00:41:16.580 |
They had a complicated system where they generated 00:41:18.860 |
the image starting at low-res and then scaling up to high-res, 00:41:24.900 |
And then in 2015, I believe, later that same year, 00:41:29.900 |
Alec Radford and Sumit Chintala and Luke Metz 00:41:46.420 |
and even some before that were deep and convolutional, 00:41:50.220 |
for a really great recipe where they were able to actually, 00:41:54.020 |
using only one model instead of a multi-step process, 00:42:07.380 |
Like, once you had animals that had a backbone, 00:42:09.740 |
you suddenly got lots of different versions of fish 00:42:12.900 |
and four-legged animals and things like that. 00:42:23.140 |
And so from there, I would say some interesting things 00:42:30.940 |
of standard image generation GANs has increased, 00:42:40.060 |
One thing is that you can use them to learn classifiers 00:42:44.580 |
without having to have class labels for every example 00:42:51.780 |
My colleague at OpenAI, Tim Solomons, who's at Brain now, 00:42:55.820 |
wrote a paper called "Improved Techniques for Training GANs." 00:43:00.900 |
but I can't claim any credit for this particular part. 00:43:07.820 |
and use it as a classifier that actually tells you, 00:43:11.340 |
you know, this image is a cat, this image is a dog, 00:43:13.620 |
this image is a car, this image is a truck, and so on. 00:43:16.420 |
Not just to say whether the image is real or fake, 00:43:22.620 |
And he found that you can train these classifiers 00:43:25.340 |
with far fewer labeled examples than traditional classifiers. 00:43:35.300 |
but your ability to classify, you're going to do much, 00:43:46.340 |
you want to look at an image of a handwritten digit 00:43:48.860 |
and say whether it's a zero, a one, or a two, and so on. 00:44:02.780 |
In 2016, with this semi-supervised GAN project, 00:44:21.100 |
but he doesn't need to have each of them labeled as, 00:44:23.460 |
you know, this one's a one, this one's a two, 00:44:27.020 |
- Then to be able to, for GANs to be able to generate 00:44:30.020 |
recognizable objects, so objects from a particular class, 00:44:49.060 |
on semi-supervised GANs where their goal isn't to classify, 00:44:58.700 |
They were working off of DeepMind's BigGAN project, 00:45:02.420 |
and they showed that they can match the performance 00:45:20.260 |
with only having about 10% of the images labeled. 00:45:24.620 |
And they do that essentially using a clustering algorithm 00:45:29.900 |
where the discriminator learns to assign the objects 00:45:36.340 |
that objects can be grouped into similar types 00:45:47.980 |
has to come from one of these archetypal groups 00:45:55.140 |
you tend to get things that look sort of like 00:46:00.500 |
but without necessarily a lot going on in them. 00:46:07.900 |
the object doesn't necessarily occupy the whole image. 00:46:11.260 |
And so you learn to create realistic sets of pixels, 00:46:20.140 |
and you want it to be in every image you make. 00:46:27.060 |
and how it turns out, again, thought-provoking, 00:46:35.740 |
So when you're doing that kind of generation, 00:46:38.220 |
you're going to end up generating greener horses or whatever. 00:46:52.360 |
So are there other types of games you come across 00:46:55.060 |
in your mind that neural networks can play with each other 00:47:05.220 |
- Yeah, the one that I spend most of my time on 00:47:07.700 |
is in security, you can model most interactions as a game 00:47:12.700 |
where there's attackers trying to break your system 00:47:15.820 |
and you're the defender trying to build a resilient system. 00:47:27.260 |
The authors had the idea before the GAN paper came out, 00:47:33.780 |
and they were very nice and cited the GAN paper, 00:47:44.340 |
a machine learning model in one setting called a domain 00:47:50.300 |
And you would like it to perform well in the new domain, 00:47:58.500 |
on a really clean image data set like ImageNet, 00:48:03.380 |
where the user is taking pictures in the dark 00:48:07.820 |
and just pictures that aren't really centered 00:48:11.340 |
When you take a normal machine learning model, 00:48:22.140 |
Domain adaptation algorithms try to smooth out that gap. 00:48:32.180 |
regardless of which domain you extracted them on. 00:48:36.900 |
you have one player that's a feature extractor 00:48:39.180 |
and another player that's a domain recognizer. 00:48:42.100 |
The domain recognizer wants to look at the output 00:48:45.740 |
and guess which of the two domains the features came from. 00:49:02.500 |
into not knowing which domain the data came from 00:49:05.380 |
and also extract features that are good for classification. 00:49:22.900 |
in order to make things work the same in both domains, 00:49:35.460 |
- Yeah, one thing you could hope for with GANs 00:49:38.100 |
is you could imagine I've got a limited training set 00:49:52.380 |
And then maybe the classifier would perform better 00:50:03.060 |
I've never heard of that particular approach working, 00:50:05.460 |
but I think there's some closely related things 00:50:14.100 |
So if we think a little bit about what we'd be hoping for 00:50:15.820 |
if we use the GAN to make more training data, 00:50:18.220 |
we're hoping that the GAN will generalize to new examples 00:50:22.060 |
better than the classifier would have generalized 00:50:39.140 |
that I haven't personally tried, but someone could try 00:50:43.380 |
of different generative models on the same training set, 00:50:50.540 |
Because each of the generative models might generalize 00:50:54.420 |
they might capture many different axes of variation 00:50:58.820 |
And then the classifier can capture all of those ideas 00:51:10.060 |
The other thing that GANs are really good for 00:51:19.340 |
but by generating new data that has different properties 00:51:26.220 |
is you can create differentially private data. 00:51:29.100 |
So suppose that you have something like medical records, 00:51:33.820 |
on the medical records and then publish the classifier, 00:51:36.460 |
because someone might be able to reverse engineer 00:51:48.980 |
still have the same differential privacy guarantees 00:51:57.220 |
and they can do almost anything they want with that data 00:52:06.460 |
on how much the original people's data has been protected. 00:52:25.700 |
- Yeah, so there's a paper from Amos Storky's lab 00:52:31.380 |
that are incapable of using specific variables. 00:52:34.780 |
So say, for example, you wanted to make predictions 00:52:55.620 |
that can still take in a lot of different attributes 00:52:58.980 |
and make a really accurate, informed prediction, 00:53:02.540 |
but be confident that it isn't reverse engineering gender 00:53:12.820 |
where you have one player that's a feature extractor 00:53:16.100 |
and another player that's a feature analyzer. 00:53:19.060 |
And you want to make sure that the feature analyzer 00:53:21.420 |
is not able to guess the value of the sensitive variable 00:53:31.620 |
you're not able to infer the sensitive variables. 00:53:39.460 |
- Another way I think that GANs in particular 00:53:51.140 |
We've seen cycle GAN turning horses into zebras. 00:53:53.860 |
We've seen other unsupervised GANs made by Mingyu Liu 00:53:58.860 |
doing things like turning day photos into night photos. 00:54:04.780 |
you could imagine taking records for people in one group 00:54:08.420 |
and transforming them into analogous people in another group 00:54:11.500 |
and testing to see if they're treated equitably 00:54:16.420 |
There's a lot of things that'd be hard to get right 00:54:18.060 |
to make sure that the conversion process itself is fair. 00:54:25.380 |
But if you could design that conversion process 00:54:27.100 |
very carefully, it might give you a way of doing audits 00:54:30.500 |
where you say, what if we took people from this group, 00:54:33.100 |
converted them into equivalent people in another group? 00:54:35.420 |
Does the system actually treat them how it ought to? 00:54:41.740 |
In popular press and in general, in our imagination, 00:54:46.740 |
you think, well, GANs are able to generate data 00:54:54.500 |
or being able to sort of maliciously generate data 00:55:03.140 |
Is this something, if you look 10, 20 years into the future, 00:55:13.540 |
- I'm a lot less concerned about 20 years from now 00:55:17.380 |
I think there will be a kind of bumpy cultural transition 00:55:26.260 |
I think 20 years from now, people will mostly understand 00:55:34.060 |
People will expect to see that it's been cryptographically 00:55:36.700 |
signed or have some other mechanism to make them believe 00:55:47.620 |
that provides a lot of mechanisms for authenticating 00:55:51.980 |
They're maybe not quite up to having a state actor 00:55:59.820 |
but it's something that people are already working on 00:56:04.140 |
- So you think authentication will eventually win out? 00:56:08.300 |
So being able to authenticate that this is real 00:56:13.300 |
- As opposed to GANs just getting better and better 00:56:15.780 |
or generative models being able to get better and better 00:56:18.220 |
to where the nature of what is real is normal. 00:56:21.500 |
- I don't think we'll ever be able to look at the pixels 00:56:24.460 |
of a photo and tell you for sure that it's real or not real. 00:56:28.580 |
And I think it would actually be somewhat dangerous 00:56:36.820 |
and then someone's able to fool your fake detector 00:56:38.900 |
and your fake detector says this image is not fake, 00:56:59.580 |
I also think we will likely get better authentication systems 00:57:07.380 |
cryptographically signs everything that comes out of it. 00:57:17.700 |
who knew the appropriate private key for this phone 00:57:24.340 |
and upload it to this server at this timestamp. 00:57:31.380 |
that have the private keys hardware embedded in them. 00:57:42.540 |
or break open the chip and learn the private key 00:57:47.460 |
for an adversary with fewer resources to fake things. 00:57:53.700 |
So you mentioned the beer and the bar and the new ideas. 00:58:04.420 |
Do you think there's still many such groundbreaking ideas 00:58:07.740 |
in deep learning that could be developed so quickly? 00:58:11.020 |
- Yeah, I do think that there are a lot of ideas 00:58:14.860 |
GANs were probably a little bit of an outlier 00:58:25.580 |
on the algorithm scale and get a big payback. 00:58:28.820 |
I think it's not as likely that you'll see that 00:58:31.900 |
in terms of things like core machine learning technologies 00:58:42.420 |
it would be a lot harder to prove that it was useful 00:58:46.940 |
because I would need to get it running on something 00:58:57.580 |
and know that it was something really new and exciting. 00:59:03.260 |
But there are other areas of machine learning 00:59:06.780 |
where I think a new idea could actually be developed 00:59:17.740 |
- Yeah, so I think fairness and interpretability 00:59:23.140 |
are areas where we just really don't have any idea 00:59:30.380 |
I don't think we even have the right definitions. 00:59:32.740 |
And even just defining a really useful concept, 00:59:40.100 |
We've seen that, for example, in differential privacy 00:59:48.060 |
where before a lot of things are really mushy 00:59:51.620 |
you could actually design randomized algorithms 00:59:56.220 |
that they preserved individual people's privacy 01:00:01.820 |
Right now, we all talk a lot about how interpretable 01:00:11.300 |
of what interpretability means in their head. 01:00:13.860 |
If we could define some concept related to interpretability 01:00:20.620 |
even without a new algorithm that increases that quantity. 01:00:24.180 |
And also once we had the definition of differential privacy, 01:00:28.780 |
it was fast to get the algorithms that guaranteed it. 01:00:31.380 |
So you could imagine once we have definitions 01:00:37.580 |
that have the interpretability guarantees quickly too. 01:00:40.540 |
- What do you think it takes to build a system 01:00:48.660 |
as we quickly venture into the philosophical? 01:00:55.620 |
- I think that it definitely takes better environments 01:01:08.740 |
I also think it's gonna take really a lot of computation. 01:01:29.740 |
or by thinking really hard about the problem. 01:01:32.140 |
I think that the agent really needs to interact 01:01:35.900 |
and have a variety of experiences within the same lifespan. 01:01:53.500 |
to perform well in many different RL environments, 01:01:57.020 |
but we don't really have anything like an agent 01:01:59.540 |
that goes seamlessly from one type of experience to another 01:02:02.940 |
and really integrates all the different things 01:02:16.780 |
Like all of them are playing like an action-based video game. 01:02:23.220 |
playing a video game to like reading the Wall Street Journal 01:02:27.500 |
to predicting how effective a molecule will be as a drug 01:02:41.700 |
natural conversation being a good benchmark for intelligence. 01:02:59.780 |
So imagine that instead of having to go to the CIFAR website 01:03:07.940 |
and then write a Python script to parse it and all that, 01:03:11.340 |
you could just point an agent at the CIFAR 10 problem 01:03:19.180 |
and trains a model and starts giving you predictions. 01:03:22.420 |
I feel like something that doesn't need to have 01:03:45.740 |
if something knows how to pre-process the data 01:03:49.580 |
so that it successfully accomplishes the task, 01:03:59.540 |
that that's the philosophical definition of intelligence, 01:04:02.260 |
but that's something that would be really cool to build, 01:04:03.780 |
that would be really useful and would impress me 01:04:05.580 |
and would convince me that we've made a step forward 01:04:13.380 |
and then next day expect it to be able to solve CIFAR-10. 01:04:22.180 |
and it figures out what web searches it should run 01:04:28.300 |
- So you have a very clear, calm way of speaking, 01:04:40.220 |
have been identified as both potentially being robots. 01:04:44.180 |
If you have to prove to the world that you are indeed human, 01:04:48.180 |
- I can understand thinking that I'm a robot. 01:04:53.180 |
- It's the flip side of the Turing test, I think. 01:05:13.860 |
- Proving that I'm not a robot with today's technology, 01:05:20.780 |
into talking about the stock market or something 01:05:39.100 |
to a separate channel to prove that something is real. 01:05:45.540 |
on a blockchain when I was born or something, 01:05:52.980 |
- So what, last question, problem stands out for you 01:05:59.940 |
- So I think resistance to adversarial examples, 01:06:02.940 |
figuring out how to make machine learning secure 01:06:05.540 |
against an adversary who wants to interfere it 01:06:07.500 |
and control it, that is one of the most important things 01:06:12.180 |
- In all domains, image, language, driving, and everything. 01:06:30.660 |
what are the important problems in security of phones 01:06:35.140 |
in like 2002, I don't think we would have anticipated 01:06:38.940 |
that we're using them for nearly as many things 01:06:44.900 |
that you can kind of try to speculate about where it's going 01:06:47.940 |
but really the business opportunities that end up taking off 01:06:56.460 |
almost anything you can do with machine learning, 01:06:58.380 |
you would like to make sure that people can't get it 01:07:02.140 |
to do what they want rather than what you want 01:07:04.660 |
just by showing it a funny QR code or a funny input pattern. 01:07:08.540 |
- And you think that the set of methodology to do that 01:07:22.820 |
that I'm excited about today is making dynamic models 01:07:25.740 |
that change every time they make a prediction. 01:07:31.180 |
and then after they're trained, we freeze them 01:07:33.180 |
and we just use the same rule to classify everything 01:07:38.260 |
That's really a sitting duck from a security point of view. 01:07:41.580 |
If you always output the same answer for the same input, 01:07:48.340 |
until they find a mistake that benefits them. 01:07:53.260 |
I think having a model that updates its predictions 01:07:56.580 |
so that it's harder to predict what you're gonna get 01:08:06.180 |
- Yeah, models that maintain a bit of a sense of mystery