back to indexWhat is Statistics? (Michael I. Jordan) | AI Podcast Clips
Chapters
0:0
0:3 What Is Statistics
1:38 Inverse Probability
4:44 Decision Theory
5:25 Bayesian Frequentist
7:26 Empirical Bayes
8:26 False Discovery Rate
00:00:00.000 |
- An absurd question, but what is statistics? 00:00:08.640 |
it's somewhere between math and science and technology. 00:00:14.780 |
to make inferences that have got some reason to be believed. 00:00:17.700 |
And also principles allow you to make decisions 00:00:39.560 |
your probability of making an error will be small. 00:00:42.360 |
Your probability of continuing to not make errors 00:00:46.480 |
And probability you found something that's real 00:00:58.720 |
it's kind of goes back as a formal discipline, 00:01:05.280 |
because around that era, probability was developed 00:01:08.200 |
sort of especially to explain gambling situations. 00:01:14.400 |
- So you would say, well, given the state of nature is this, 00:01:22.280 |
And especially if I do things long amounts of time, 00:01:26.520 |
And the physicists started to pay attention to this. 00:01:34.120 |
Could I infer what the underlying mechanism was? 00:01:48.280 |
who was trying, who needed to do a census of France, 00:01:54.760 |
and he analyzed that data to determine policy 00:02:26.220 |
People in that era didn't think of themselves 00:02:33.240 |
And so, Von Neumann is developing game theory, 00:02:35.400 |
but also thinking of that as decision theory. 00:02:38.040 |
Wald is an econometrician developing decision theory 00:02:56.700 |
And to this day, in most advanced statistical curricula, 00:03:01.840 |
you teach decision theory as the starting point. 00:03:04.160 |
And then it branches out into the two branches 00:03:15.040 |
mysterious, maybe surprising idea that you've come across? 00:03:26.340 |
There's something that's way too technical for this thing, 00:03:32.340 |
and really takes time to wrap your head around. 00:03:38.740 |
Let me just say a colleague at Steven Stigler 00:03:41.540 |
at University of Chicago wrote a really beautiful paper 00:03:47.260 |
It kind of defeats the mind's attempts to understand it, 00:03:49.460 |
but you can, and Steve has a nice perspective on that. 00:03:57.220 |
is that it's like in physics, or in quantum physics, 00:04:01.300 |
There's a wave and particle duality in physics. 00:04:07.380 |
that you don't really quite understand the relationship. 00:04:10.500 |
The electron's a wave and electron's a particle. 00:04:15.500 |
There's Bayesian ways of thinking and Frequentist, 00:04:19.260 |
They sometimes become sort of the same in practice, 00:04:23.740 |
And then in some practice, they are not the same at all. 00:04:29.220 |
And so it is very much like wave and particle duality, 00:04:40.120 |
It's called Are You a Bayesian or a Frequentist? 00:04:42.060 |
And kind of help try to make it really clear. 00:04:45.940 |
So, decision theory, you're talking about loss functions, 00:04:50.180 |
which are a function of data X and parameter theta. 00:04:58.680 |
You don't know the data a priori, it's random, 00:05:03.260 |
So you have this function of two things you don't know, 00:05:05.100 |
and you're trying to say, I want that function to be small. 00:05:13.980 |
over these quantities or maximize over them or something, 00:05:16.780 |
so that I turn that uncertainty into something certain. 00:05:20.840 |
So you could look at the first argument and average over it, 00:05:24.260 |
or you could look at the second argument, average over it. 00:05:26.780 |
So the Frequentist says, I'm gonna look at the X, the data, 00:05:40.860 |
And so it's looking at all the datasets you could get, 00:05:43.820 |
and saying how well will a certain procedure do 00:05:55.820 |
and people are using it on all kinds of datasets. 00:06:02.540 |
that 95% of the time it will do the right thing. 00:06:10.540 |
I'm gonna look at the other argument of the loss function, 00:06:16.300 |
So I could have my own personal probability for what it is. 00:06:20.980 |
I'm trying to infer the average height of the population. 00:06:22.860 |
Well, I have an idea of roughly what the height is. 00:06:30.860 |
So now that loss function has only now, again, 00:06:37.580 |
And that's what a Bayesian does, is they say, 00:06:39.180 |
well, let's just focus on the particular X we got, 00:06:43.780 |
Condition on the X, I say something about my loss. 00:06:49.140 |
And the Bayesian will argue that it's not relevant 00:06:52.060 |
to look at all the other datasets you could have gotten 00:06:54.700 |
and average over them, the frequentist approach. 00:06:57.500 |
It's really only the dataset you got, all right? 00:07:02.340 |
especially in situations where you're working 00:07:04.020 |
with a scientist, you can learn a lot about the domain, 00:07:06.380 |
and you really only focus on certain kinds of data, 00:07:08.420 |
and you've gathered your data, and you make inferences. 00:07:11.140 |
I don't agree with it though, in the sense that 00:07:16.980 |
You're writing software, people are using it out there, 00:07:19.700 |
So these two things have to got to fight each other 00:07:31.740 |
It's kind of arguably philosophically more reasonable 00:07:42.260 |
and then realize there's a bunch of things you don't know, 00:07:46.820 |
so you're uncertain about certain quantities. 00:07:48.900 |
At that point, ask, is there a reasonable way 00:07:52.940 |
And in some cases, there's quite a reasonable thing to do, 00:07:58.820 |
There's a natural thing you can observe in the world 00:08:05.180 |
- So based on math or based on human expertise, 00:08:12.540 |
but the math kind of guides you along that path, 00:08:19.340 |
Under certain assumptions, this thing will work. 00:08:21.540 |
So you asked the question, what's my favorite, 00:08:28.460 |
which is you're making not just one hypothesis test, 00:08:33.260 |
or making one decision, you're making a whole bag of them. 00:08:38.180 |
you look at the ones where you made a discovery, 00:08:39.980 |
you announced that something interesting had happened. 00:08:42.620 |
All right, that's gonna be some subset of your big bag. 00:08:52.060 |
You'd like the fraction of your false discoveries 00:08:58.680 |
or precision, or recall, or sensitivity and specificity. 00:09:08.740 |
They say, given the truth is that the null hypothesis is 00:09:17.340 |
So it's kind of going forward from the state of nature 00:09:21.180 |
The Bayesian goes the other direction from the data 00:09:24.660 |
And that's actually what false discovery rate is. 00:09:35.380 |
And so the classical frequentist look at that, 00:09:38.380 |
so I can't know that there's some priors needed in that. 00:09:41.300 |
And the empirical Bayesian goes ahead and plows forward 00:09:48.180 |
some of those things can actually be estimated 00:09:53.040 |
So this kind of line of argument has come out, 00:09:56.720 |
but it sort of came out from Robbins around 1960. 00:10:01.100 |
Brad Efron has written beautifully about this 00:10:05.100 |
And the FDR is, you know, Ben Yamini in Israel, 00:10:10.100 |
John Story did this Bayesian interpretation and so on. 00:10:13.560 |
So I've just absorbed these things over the years 00:10:15.820 |
and find it a very healthy way to think about statistics.