George Hotz: Winning - A Reinforcement Learning Approach

You said that the meaning of life is to win. If you look five years into the future, what does winning look like? So... I... there's a lot of... I can go into like technical depth to what I mean by that, to win. It may not mean... I was criticized for that in the comments, like, "Doesn't this guy want to like save the penguins in Antarctica?" or like...

Oh man, you know, listen to what I'm saying. I'm not talking about like I have a yacht or something. Yeah. I am an agent. I am put into this world. And I don't really know what my purpose is. But if you're a reinforcement, if you're an intelligent agent and you're put into a world, what is the ideal thing to do?

Well, the ideal thing, mathematically, you can go back to like Schmidhuber theories about this, is to build a compressive model of the world, to build a maximally compressive to explore the world such that your exploration function maximizes the derivative of compression of the past. Schmidhuber has a paper about this.

And like, I took that kind of as like a personal goal function. So what I mean to win, I mean like, maybe this is religious, but like I think that in the future, I might be given a real purpose or I may decide this purpose myself. And then at that point, now I know what the game is and I know how to win.

I think right now, I'm still just trying to figure out what the game is. But once I know... So you have imperfect information, you have a lot of uncertainty about the reward function and you're discovering it. Exactly. But the purpose is... That's a better way to put it. The purpose is to maximize it while you have a lot of uncertainty around it.

And you're both reducing the uncertainty and maximizing at the same time. Yeah. And so that's at the technical level. What is the... If you believe in the universal prior, what is the universal reward function? That's the better way to put it. So that win is interesting. I think I speak for everyone in saying that I wonder what that reward function is for you.

And I look forward to seeing that in five years and 10 years. I think a lot of people, including myself, are cheering you on, man. So I'm happy you exist and I wish you the best of luck. Thank you. Thank you.

George Hotz: Winning - A Reinforcement Learning Approach | AI Podcast Clips

Transcript