back to indexGeorge Hotz: Winning - A Reinforcement Learning Approach | AI Podcast Clips
00:00:10.320 |
If you look five years into the future, what does winning look like? 00:00:16.960 |
I... there's a lot of... I can go into like technical depth to what I mean by that, to win. 00:00:27.840 |
It may not mean... I was criticized for that in the comments, like, 00:00:30.800 |
"Doesn't this guy want to like save the penguins in Antarctica?" or like... 00:00:34.800 |
Oh man, you know, listen to what I'm saying. I'm not talking about like I have a yacht or something. 00:00:49.920 |
But if you're a reinforcement, if you're an intelligent agent and you're put into a world, 00:00:55.600 |
Well, the ideal thing, mathematically, you can go back to like Schmidhuber theories about this, 00:00:59.600 |
is to build a compressive model of the world, 00:01:02.880 |
to build a maximally compressive to explore the world such that your exploration function 00:01:08.160 |
maximizes the derivative of compression of the past. 00:01:13.120 |
And like, I took that kind of as like a personal goal function. 00:01:17.760 |
So what I mean to win, I mean like, maybe this is religious, but like I think that in the future, 00:01:23.680 |
I might be given a real purpose or I may decide this purpose myself. 00:01:27.120 |
And then at that point, now I know what the game is and I know how to win. 00:01:30.640 |
I think right now, I'm still just trying to figure out what the game is. 00:01:33.920 |
So you have imperfect information, you have a lot of uncertainty about the reward function 00:01:44.240 |
The purpose is to maximize it while you have a lot of uncertainty around it. 00:01:50.320 |
And you're both reducing the uncertainty and maximizing at the same time. 00:01:57.520 |
If you believe in the universal prior, what is the universal reward function? 00:02:06.080 |
I think I speak for everyone in saying that I wonder what that reward function is for you. 00:02:13.760 |
And I look forward to seeing that in five years and 10 years. 00:02:19.360 |
I think a lot of people, including myself, are cheering you on, man. 00:02:22.240 |
So I'm happy you exist and I wish you the best of luck.