back to index

Greg Brockman: OpenAI and AGI | Lex Fridman Podcast #17


Chapters

0:0
0:1 Greg Brockman
4:45 Technological Determinism
7:0 Wikipedia
13:55 Technical Safety
16:18 Policy Team
18:45 History of Ai
21:44 Generality
22:3 Competence
24:58 Formation of Open Ai
31:29 The Startup Mindset
40:28 Is this Idea that It'D Be Great if You Could Try To Describe or Untangle Switching from Competition to Collaboration and Late-Stage Agi Development It Was Really Interesting this Dance between Competition and Collaboration How Do You Think about that Yeah Assuming You Can Actually Do the Technical Side of Agi Development I Think There's Going To Be Two Key Problems with Figuring Out How Do You Actually Deploy It Make It Go Well the First One of these Is the Run-Up to Building the First Agi You Look at How Self-Driving Cars Are Being Developed
47:1 Then There Was a Bunch of Conversation Where Various People Said It's So Obvious that You Should Have Just Released It There Other People Said It's So Obvious You Should Not Have Released It and I Think that that Almost Definitionally Means that Holding It Back Was the Correct Decision Right if It's Contra if There's if It's Not Obvious whether Something Is Beneficial or Not You Should Probably Default to Caution and So I Think that the Overall Landscape for How We Think about It Is that this Decision Could Have Gone either Way There Are Great Arguments in both Directions but for Future Models down the Road and Possibly Sooner than You'D Expect because You Know Scaling these Things Up Doesn't Have To Take that Long those Ones but You'Re Definitely Not Going To Want To Release into the Wild
77:33 The Reasoning Team
80:28 Simulation for Self-Driving Cars

Whisper Transcript | Transcript Only Page

00:00:00.000 | The following is a conversation with Greg Brockman.
00:00:02.900 | He's the co-founder and CTO of OpenAI,
00:00:05.380 | a world-class research organization,
00:00:07.440 | developing ideas in AI with the goal of eventually
00:00:10.820 | creating a safe and friendly
00:00:12.900 | artificial general intelligence,
00:00:15.380 | one that benefits and empowers humanity.
00:00:18.840 | OpenAI is not only a source of publications,
00:00:21.740 | algorithms, tools, and data sets.
00:00:24.480 | Their mission is a catalyst for an important public discourse
00:00:28.140 | about our future with both narrow
00:00:30.700 | and general intelligence systems.
00:00:34.040 | This conversation is part of the
00:00:35.820 | Artificial Intelligence Podcast at MIT and beyond.
00:00:39.520 | If you enjoy it, subscribe on YouTube, iTunes,
00:00:42.760 | or simply connect with me on Twitter
00:00:44.560 | at Lex Friedman, spelled F-R-I-D.
00:00:48.060 | And now, here's my conversation with Greg Brockman.
00:00:51.820 | So in high school, and right after you wrote a draft
00:00:55.100 | of a chemistry textbook, I saw that.
00:00:57.300 | That covers everything from basic structure
00:00:59.100 | of the atom to quantum mechanics.
00:01:01.420 | So it's clear you have an intuition and a passion
00:01:04.420 | for both the physical world with chemistry
00:01:08.340 | and now robotics to the digital world
00:01:11.420 | with AI, deep learning, reinforcement learning, and so on.
00:01:15.400 | Do you see the physical world and the digital world
00:01:17.380 | as different, and what do you think is the gap?
00:01:20.540 | - A lot of it actually boils down to iteration speed.
00:01:23.380 | That I think that a lot of what really motivates me
00:01:25.260 | is building things, right?
00:01:26.540 | Just the, you know, think about mathematics, for example,
00:01:29.340 | where you think really hard about a problem,
00:01:30.900 | you understand it, you write it down
00:01:32.220 | in this very obscure form that you call proof.
00:01:34.580 | But then, this is in humanity's library, right?
00:01:37.620 | It's there forever, this is some truth that we've discovered.
00:01:40.700 | You know, maybe only five people in your field
00:01:42.100 | will ever read it, but somehow you've kind of
00:01:43.820 | moved humanity forward.
00:01:45.740 | And so I actually used to really think
00:01:46.900 | that I was going to be a mathematician,
00:01:48.660 | and then I actually started writing this chemistry textbook.
00:01:51.680 | One of my friends told me, you'll never publish it
00:01:53.380 | because you don't have a PhD.
00:01:54.860 | So instead, I decided to build a website
00:01:57.980 | and try to promote my ideas that way.
00:01:59.940 | And then I discovered programming.
00:02:01.460 | And I, you know, that in programming,
00:02:03.940 | you think hard about a problem, you understand it,
00:02:06.060 | you write it down in a very obscure form
00:02:08.020 | that we call a program.
00:02:10.000 | But then once again, it's in humanity's library, right?
00:02:12.200 | And anyone can get the benefit from it,
00:02:14.060 | and the scalability is massive.
00:02:15.660 | And so I think that the thing that really appeals to me
00:02:17.740 | about the digital world is that you can have
00:02:19.860 | this insane leverage, right?
00:02:21.960 | A single individual with an idea
00:02:24.180 | is able to affect the entire planet.
00:02:26.060 | And that's something I think is really hard to do
00:02:28.180 | if you're moving around physical atoms.
00:02:30.220 | - But you said mathematics, so if you look at the wet thing
00:02:35.020 | over here, our mind, do you ultimately see it as just math,
00:02:39.780 | as just information processing?
00:02:41.740 | Or is there some other magic, as you've seen,
00:02:44.380 | if you've seen through biology and chemistry and so on?
00:02:47.020 | - I think it's really interesting to think about humans
00:02:48.980 | as just information processing systems.
00:02:50.980 | And that it seems like it's actually a pretty good way
00:02:54.060 | of describing a lot of how the world works,
00:02:57.140 | or a lot of what we're capable of,
00:02:58.500 | to think that, again, if you just look
00:03:01.140 | at technological innovations over time,
00:03:03.620 | that in some ways, the most transformative innovation
00:03:05.900 | that we've had has been the computer, right?
00:03:07.740 | In some ways, the internet, what has the internet done,
00:03:10.300 | right, the internet is not about these physical cables.
00:03:12.700 | It's about the fact that I am suddenly able
00:03:14.780 | to instantly communicate with any other human on the planet.
00:03:17.620 | I'm able to retrieve any piece of knowledge
00:03:19.660 | that in some ways the human race has ever had.
00:03:22.660 | And that those are these insane transformations.
00:03:26.100 | - Do you see our society as a whole, the collective,
00:03:29.340 | as another extension of the intelligence
00:03:31.260 | of the human being?
00:03:32.280 | So if you look at the human being
00:03:33.420 | as an information processing system,
00:03:35.060 | you mentioned the internet, the networking,
00:03:36.920 | do you see us all together as a civilization,
00:03:39.360 | as a kind of intelligence system?
00:03:41.680 | - Yeah, I think this is actually
00:03:42.900 | a really interesting perspective to take
00:03:44.900 | and to think about that you sort of have
00:03:46.740 | this collective intelligence of all of society.
00:03:49.520 | The economy itself is this superhuman machine
00:03:51.660 | that is optimizing something, right?
00:03:54.420 | And it's, in some ways, a company has a will of its own,
00:03:57.700 | right, that you have all these individuals
00:03:59.060 | who are all pursuing their own individual goals
00:04:00.820 | and thinking really hard and thinking
00:04:02.380 | about the right things to do,
00:04:03.620 | but somehow the company does something
00:04:05.340 | that is this emergent thing
00:04:07.900 | and that is a really useful abstraction.
00:04:10.620 | And so I think that in some ways,
00:04:12.460 | we think of ourselves as the most intelligent things
00:04:14.880 | on the planet and the most powerful things on the planet,
00:04:17.480 | but there are things that are bigger than us,
00:04:19.300 | that are these systems that we all contribute to.
00:04:21.420 | And so I think actually it's interesting to think about,
00:04:24.980 | if you've read Isaac Asimov's foundation, right,
00:04:27.420 | that there's this concept of psychohistory in there,
00:04:30.020 | which is effectively this,
00:04:31.020 | that if you have trillions or quadrillions of beings,
00:04:33.900 | then maybe you could actually predict what that being,
00:04:36.540 | that huge macro being will do
00:04:39.060 | and almost independent of what the individuals want.
00:04:42.380 | I actually have a second angle on this
00:04:44.220 | that I think is interesting,
00:04:45.040 | which is thinking about technological determinism.
00:04:48.380 | One thing that I actually think a lot about with open AI,
00:04:51.260 | right, is that we're kind of coming on
00:04:53.300 | to this insanely transformational technology
00:04:55.860 | of general intelligence, right,
00:04:57.380 | that will happen at some point.
00:04:58.820 | And there's a question of how can you take actions
00:05:01.540 | that will actually steer it to go better rather than worse?
00:05:04.860 | And that I think one question you need to ask
00:05:06.540 | is as a scientist, as an inventor, as a creator,
00:05:09.300 | what impact can you have in general, right?
00:05:11.700 | You look at things like the telephone
00:05:12.860 | invented by two people on the same day.
00:05:14.820 | Like, what does that mean?
00:05:15.940 | Like, what does that mean about the shape of innovation?
00:05:18.100 | And I think that what's going on
00:05:19.260 | is everyone's building on the shoulders of the same giants.
00:05:21.700 | And so you can kind of,
00:05:22.900 | you can't really hope to create something
00:05:24.500 | no one else ever would.
00:05:25.700 | You know, if Einstein wasn't born,
00:05:27.060 | someone else would have come up with relativity.
00:05:29.220 | You know, we changed the timeline a bit, right,
00:05:31.020 | that maybe it would have taken another 20 years,
00:05:32.980 | but it wouldn't be that fundamentally,
00:05:34.180 | humanity would never discover these fundamental truths.
00:05:37.340 | - So there's some kind of invisible momentum
00:05:40.460 | that some people like Einstein or open AI is plugging into
00:05:45.380 | that anybody else can also plug into
00:05:47.820 | and ultimately that wave takes us into a certain direction.
00:05:50.820 | That's what you mean by digitalism.
00:05:51.860 | - That's right, that's right.
00:05:52.820 | And you know, this kind of seems to play out
00:05:54.220 | in a bunch of different ways,
00:05:55.700 | that there's some exponential that is being ridden
00:05:58.020 | and that the exponential itself, which one it is, changes.
00:06:00.620 | Think about Moore's law,
00:06:01.540 | an entire industry set its clock to it for 50 years.
00:06:04.820 | Like, how can that be, right?
00:06:06.180 | How is that possible?
00:06:07.380 | And yet somehow it happened.
00:06:09.300 | And so I think you can't hope to ever invent something
00:06:12.180 | that no one else will.
00:06:13.340 | Maybe you can change the timeline a little bit,
00:06:15.340 | but if you really want to make a difference,
00:06:17.380 | I think that the thing that you really have to do,
00:06:19.420 | the only real degree of freedom you have
00:06:21.340 | is to set the initial conditions
00:06:23.060 | under which a technology is born.
00:06:24.940 | And so you think about the internet, right?
00:06:26.660 | That there are lots of other competitors
00:06:27.860 | trying to build similar things and the internet won
00:06:30.780 | and that the initial conditions
00:06:33.260 | were that it was created by this group
00:06:34.700 | that really valued people being able to be,
00:06:37.260 | anyone being able to plug in this very academic mindset
00:06:40.260 | of being open and connected.
00:06:42.100 | And I think that the internet for the next 40 years
00:06:44.380 | really played out that way.
00:06:46.340 | You know, maybe today things are starting to shift
00:06:48.780 | in a different direction,
00:06:49.820 | but I think that those initial conditions
00:06:51.140 | were really important to determine
00:06:52.740 | the next 40 years worth of progress.
00:06:55.060 | - That's really beautifully put.
00:06:56.420 | So another example that I think about,
00:06:58.780 | you know, I recently looked at it.
00:07:00.780 | I looked at Wikipedia, the formation of Wikipedia,
00:07:03.780 | and I wonder what the internet would be like
00:07:05.500 | if Wikipedia had ads.
00:07:07.740 | You know, there's a interesting argument
00:07:09.620 | that why they chose not to make it,
00:07:12.580 | put advertisement on Wikipedia.
00:07:14.220 | I think Wikipedia is one of the greatest resources
00:07:17.780 | we have on the internet.
00:07:18.900 | It's extremely surprising how well it works
00:07:21.220 | and how well it was able to aggregate
00:07:22.900 | all this kind of good information.
00:07:24.980 | And essentially the creator of Wikipedia,
00:07:27.260 | I don't know, there's probably some debates there,
00:07:29.300 | but set the initial conditions
00:07:31.140 | and how it carried itself forward.
00:07:33.220 | That's really interesting.
00:07:34.060 | So the way you're thinking about AGI
00:07:36.500 | or artificial intelligence is you're focused
00:07:38.420 | on setting the initial conditions for the progress.
00:07:41.140 | - That's right.
00:07:42.300 | - That's powerful.
00:07:43.120 | - Okay, so look into the future.
00:07:45.540 | If you create an AGI system,
00:07:48.140 | like one that can ace the Turing test, natural language,
00:07:51.580 | what do you think would be the interactions
00:07:54.780 | you would have with it?
00:07:55.860 | What do you think are the questions you would ask?
00:07:57.740 | Like what would be the first question you would ask it,
00:08:00.940 | her, him?
00:08:01.820 | - That's right.
00:08:02.660 | I think that at that point,
00:08:03.940 | if you've really built a powerful system
00:08:05.940 | that is capable of shaping the future of humanity,
00:08:08.500 | the first question that you really should ask
00:08:10.260 | is how do we make sure that this plays out well?
00:08:12.300 | And so that's actually the first question
00:08:14.000 | that I would ask a powerful AGI system is--
00:08:17.640 | - So you wouldn't ask your colleague,
00:08:19.200 | you wouldn't ask like Ilya,
00:08:20.800 | you would ask the AGI system.
00:08:22.320 | - Oh, we've already had the conversation with Ilya, right?
00:08:24.660 | And everyone here.
00:08:25.760 | And so you want as many perspectives
00:08:27.520 | and a piece of wisdom as you can
00:08:29.720 | for answering this question.
00:08:31.260 | So I don't think you necessarily defer
00:08:32.480 | to whatever your powerful system tells you,
00:08:35.480 | but you use it as one input to try to figure out what to do.
00:08:39.280 | But I guess fundamentally what it really comes down to
00:08:41.840 | is if you built something really powerful,
00:08:43.940 | and you think about, for example,
00:08:45.260 | the creation of, shortly after,
00:08:47.620 | the creation of nuclear weapons,
00:08:48.900 | the most important question in the world
00:08:50.400 | was what's the world order going to be like?
00:08:52.780 | How do we set ourselves up in a place
00:08:54.900 | where we're going to be able to survive as a species?
00:08:58.300 | With AGI, I think the question's slightly different,
00:09:00.660 | that there is a question of how do we make sure
00:09:02.740 | that we don't get the negative effects?
00:09:04.420 | But there's also the positive side.
00:09:06.260 | You imagine that, what won't AGI be like?
00:09:09.900 | What will it be capable of?
00:09:11.300 | And I think that one of the core reasons
00:09:13.480 | that an AGI can be powerful and transformative
00:09:15.720 | is actually due to technological development.
00:09:18.880 | If you have something that's capable as a human
00:09:21.400 | and that it's much more scalable,
00:09:23.840 | that you absolutely want that thing
00:09:25.840 | to go read the whole scientific literature
00:09:27.600 | and think about how to create cures for all the diseases.
00:09:29.960 | You want it to think about how to go and build technologies
00:09:32.840 | to help us create material abundance
00:09:34.460 | and to figure out societal problems
00:09:37.280 | that we have trouble with,
00:09:38.120 | like how are we supposed to clean up the environment?
00:09:39.960 | And maybe you want this to go and invent
00:09:42.780 | a bunch of little robots that will go out
00:09:44.100 | and be biodegradable and turn ocean debris
00:09:47.240 | into harmless molecules.
00:09:49.620 | And I think that that positive side
00:09:53.980 | is something that I think people miss sometimes
00:09:56.100 | when thinking about what an AGI will be like.
00:09:58.140 | And so I think that if you have a system
00:10:00.260 | that's capable of all of that,
00:10:01.580 | you absolutely want its advice about how do I make sure
00:10:03.940 | that we're using your capabilities
00:10:07.560 | in a positive way for humanity.
00:10:09.180 | - So what do you think about that psychology
00:10:11.380 | that looks at all the different possible trajectories
00:10:14.760 | of an AGI system, many of which,
00:10:17.500 | perhaps the majority of which are positive,
00:10:19.940 | and nevertheless focuses on the negative trajectories?
00:10:23.300 | I mean, you get to interact with folks,
00:10:24.700 | you get to think about this,
00:10:26.020 | maybe within yourself as well.
00:10:28.820 | You look at Sam Harris and so on.
00:10:30.520 | It seems to be, sorry to put it this way,
00:10:32.720 | but almost more fun to think about
00:10:34.540 | the negative possibilities.
00:10:37.780 | Whatever that's deep in our psychology,
00:10:39.540 | what do you think about that?
00:10:40.820 | And how do we deal with it?
00:10:41.900 | Because we want AI to help us.
00:10:44.380 | - So I think there's kind of two problems
00:10:47.860 | entailed in that question.
00:10:49.940 | The first is more of the question of
00:10:52.380 | how can you even picture what a world
00:10:54.620 | with a new technology will be like?
00:10:56.580 | Now imagine we're in 1950,
00:10:57.860 | and I'm trying to describe Uber to someone.
00:11:00.020 | (laughing)
00:11:02.860 | - Apps and the internet.
00:11:05.380 | Yeah, I mean, that's going to be extremely complicated.
00:11:08.900 | But it's imaginable.
00:11:10.180 | - It's imaginable, right?
00:11:11.900 | And now imagine being in 1950 and predicting Uber, right?
00:11:15.300 | And you need to describe the internet,
00:11:17.660 | you need to describe GPS,
00:11:18.740 | you need to describe the fact that
00:11:20.500 | everyone's going to have this phone in their pocket.
00:11:23.060 | And so I think that just the first truth
00:11:26.180 | is that it is hard to picture
00:11:28.040 | how a transformative technology will play out in the world.
00:11:31.180 | We've seen that before with technologies
00:11:32.780 | that are far less transformative than AGI will be.
00:11:35.580 | And so I think that one piece is that
00:11:37.780 | it's just even hard to imagine
00:11:39.540 | and to really put yourself in a world
00:11:41.620 | where you can predict what that positive vision
00:11:44.620 | would be like.
00:11:45.780 | And I think the second thing is that
00:11:48.900 | it is, I think it is always easier to support
00:11:53.300 | the negative side than the positive side.
00:11:55.060 | It's always easier to destroy than create.
00:11:57.160 | And less in a physical sense
00:12:00.780 | and more just in an intellectual sense, right?
00:12:03.100 | Because I think that with creating something,
00:12:05.680 | you need to just get a bunch of things right
00:12:07.420 | and to destroy, you just need to get one thing wrong.
00:12:10.300 | And so I think that what that means
00:12:12.060 | is that I think a lot of people's thinking dead ends
00:12:14.260 | as soon as they see the negative story.
00:12:16.900 | But that being said, I actually have some hope, right?
00:12:20.340 | I think that the positive vision
00:12:23.180 | is something that I think can be,
00:12:26.020 | is something that we can talk about.
00:12:27.580 | And I think that just simply saying this fact of,
00:12:30.240 | yeah, like there's positives, there's negatives,
00:12:31.980 | everyone likes to dwell on the negative,
00:12:33.580 | people actually respond well to that message
00:12:35.100 | and say, huh, you're right, there's a part of this
00:12:37.060 | that we're not talking about, not thinking about.
00:12:39.660 | And that's actually something that's, I think,
00:12:41.500 | really been a key part of how we think about AGI
00:12:45.380 | at OpenAI, right?
00:12:46.660 | You can kind of look at it as like, okay,
00:12:48.180 | like OpenAI talks about the fact that there are risks
00:12:51.020 | and yet they're trying to build this system.
00:12:53.200 | Like, how do you square those two facts?
00:12:56.100 | - So do you share the intuition that some people have,
00:12:59.180 | I mean, from Sam Harris to even Elon Musk himself,
00:13:02.740 | that it's tricky as you develop AGI
00:13:06.620 | to keep it from slipping into the existential threats,
00:13:10.460 | into the negative?
00:13:11.780 | What's your intuition about how hard is it
00:13:14.820 | to keep AI development on the positive track?
00:13:19.700 | What's your intuition there?
00:13:20.740 | - To answer that question, you can really look
00:13:22.300 | at how we structure OpenAI.
00:13:23.980 | So we really have three main arms.
00:13:25.940 | We have capabilities, which is actually doing
00:13:27.980 | the technical work and pushing forward
00:13:29.900 | what these systems can do.
00:13:31.220 | There's safety, which is working on technical mechanisms
00:13:35.180 | to ensure that the systems we build
00:13:36.980 | are aligned with human values.
00:13:38.500 | And then there's policy, which is making sure
00:13:40.740 | that we have governance mechanisms,
00:13:42.060 | answering that question of, well, whose values?
00:13:45.300 | And so I think that the technical safety one
00:13:47.420 | is the one that people kind of talk about the most, right?
00:13:50.500 | You talk about, like, think about all of the dystopic AI
00:13:53.860 | movies, a lot of that is about not having
00:13:55.820 | good technical safety in place.
00:13:57.580 | And what we've been finding is that,
00:13:59.860 | you know, I think that actually a lot of people
00:14:01.380 | look at the technical safety problem
00:14:02.660 | and think it's just intractable.
00:14:04.260 | Right, this question of what do humans want?
00:14:07.860 | How am I supposed to write that down?
00:14:09.180 | Can I even write down what I want?
00:14:11.220 | No way.
00:14:12.060 | And then they stop there.
00:14:14.820 | But the thing is, we've already built systems
00:14:16.900 | that are able to learn things that humans can't specify.
00:14:20.940 | You know, even the rules for how to recognize
00:14:22.940 | if there's a cat or a dog in an image.
00:14:24.980 | Turns out it's intractable to write that down,
00:14:26.540 | and yet we're able to learn it.
00:14:28.460 | And that what we're seeing with systems we build at OpenAI,
00:14:31.100 | and they're still in early proof of concept stage,
00:14:33.820 | is that you are able to learn human preferences.
00:14:36.340 | You're able to learn what humans want from data.
00:14:38.980 | And so that's kind of the core focus
00:14:40.420 | for our technical safety team.
00:14:41.780 | And I think that there actually,
00:14:43.820 | we've had some pretty encouraging updates
00:14:45.700 | in terms of what we've been able to make work.
00:14:48.060 | - So you have an intuition and a hope that from data,
00:14:51.700 | you know, looking at the value alignment problem,
00:14:53.660 | from data we can build systems that align
00:14:57.060 | with the collective better angels of our nature.
00:15:00.640 | So align with the ethics and the morals of human beings.
00:15:04.620 | - To even say this in a different way,
00:15:05.900 | I mean, think about how do we align humans, right?
00:15:08.620 | Think about like a human baby can grow up
00:15:10.420 | to be an evil person or a great person.
00:15:12.940 | And a lot of that is from learning from data, right?
00:15:15.240 | That you have some feedback as a child is growing up,
00:15:17.740 | they get to see positive examples.
00:15:19.220 | And so I think that just like,
00:15:21.980 | that the only example we have of a general intelligence
00:15:25.420 | that is able to learn from data
00:15:28.060 | to align with human values and to learn values,
00:15:31.420 | I think we shouldn't be surprised
00:15:32.900 | that we can do the same sorts of techniques
00:15:36.020 | or whether the same sort of techniques
00:15:37.420 | end up being how we solve value alignment for AGIs.
00:15:41.100 | - So let's go even higher.
00:15:42.700 | I don't know if you've read the book "Sapiens",
00:15:44.780 | but there's an idea that, you know,
00:15:48.300 | that as a collective, as us human beings,
00:15:49.980 | we kind of develop together ideas that we hold.
00:15:54.740 | There's no, in that context, objective truth.
00:15:57.920 | We just kind of all agree to certain ideas
00:15:59.980 | and hold them as a collective.
00:16:01.460 | Did you have a sense that there is,
00:16:03.500 | in the world of good and evil,
00:16:05.340 | do you have a sense that to the first approximation,
00:16:07.580 | there are some things that are good
00:16:10.260 | and that you could teach systems to behave to be good?
00:16:14.540 | - So I think that this actually blends into our third team,
00:16:18.300 | right, which is the policy team.
00:16:19.900 | And this is the one, the aspect I think people
00:16:22.300 | really talk about way less than they should, right?
00:16:25.300 | 'Cause imagine that we build super powerful systems
00:16:27.640 | that we've managed to figure out all the mechanisms
00:16:29.740 | for these things to do whatever the operator wants.
00:16:32.800 | The most important question becomes,
00:16:34.460 | who's the operator, what do they want,
00:16:36.700 | and how is that going to affect everyone else, right?
00:16:39.380 | And I think that this question of what is good,
00:16:43.060 | what are those values?
00:16:43.980 | I mean, I think you don't even have to go
00:16:45.940 | to those very grand existential places
00:16:48.400 | to start to realize how hard this problem is.
00:16:50.900 | You just look at different countries
00:16:52.860 | and cultures across the world,
00:16:54.500 | and that there's a very different conception
00:16:57.140 | of how the world works and what kinds of ways
00:17:01.940 | that society wants to operate.
00:17:03.380 | And so I think that the really core question
00:17:06.980 | is actually very concrete.
00:17:09.580 | And I think it's not a question
00:17:10.980 | that we have ready answers to, right?
00:17:12.700 | It's how do you have a world where all
00:17:15.900 | of the different countries that we have,
00:17:17.300 | United States, China, Russia,
00:17:19.760 | and the hundreds of other countries out there
00:17:22.760 | are able to continue to not just operate
00:17:26.620 | in the way that they see fit,
00:17:28.460 | but in the world that emerges
00:17:31.300 | where you have these very powerful systems
00:17:34.760 | operating alongside humans,
00:17:37.800 | ends up being something that empowers humans more,
00:17:39.820 | that makes human existence be a more meaningful thing,
00:17:44.120 | and that people are happier and wealthier
00:17:46.420 | and able to live more fulfilling lives.
00:17:48.980 | It's not an obvious thing for how to design that world
00:17:51.580 | once you have that very powerful system.
00:17:53.620 | - So if we take a little step back,
00:17:55.820 | and we're having a fascinating conversation,
00:17:58.220 | and OpenAI is in many ways a tech leader in the world,
00:18:02.020 | and yet we're thinking about these big existential questions
00:18:05.460 | which is fascinating and really important.
00:18:07.020 | I think you're a leader in that space,
00:18:09.160 | and that's a really important space,
00:18:10.860 | of just thinking how AI affects society
00:18:13.100 | in a big picture view.
00:18:14.380 | So Oscar Wilde said, "We're all in the gutter,
00:18:17.260 | "but some of us are looking at the stars,"
00:18:19.020 | and I think OpenAI has a charter
00:18:22.340 | that looks to the stars, I would say,
00:18:24.600 | to create intelligence, to create general intelligence,
00:18:26.900 | make it beneficial, safe, and collaborative.
00:18:29.500 | Can you tell me how that came about,
00:18:33.700 | how a mission like that,
00:18:35.140 | and the path to creating a mission like that
00:18:37.540 | at OpenAI was founded?
00:18:39.140 | - Yeah, so I think that in some ways
00:18:41.660 | it really boils down to taking a look
00:18:43.900 | at the landscape, right?
00:18:45.140 | So if you think about the history of AI,
00:18:47.060 | that basically for the past 60 or 70 years,
00:18:49.960 | people have thought about this goal
00:18:51.660 | of what could happen if you could automate
00:18:53.980 | human intellectual labor.
00:18:55.640 | Imagine you could build a computer system
00:18:58.300 | that could do that.
00:18:59.300 | What becomes possible?
00:19:00.580 | We have a lot of sci-fi that tells stories
00:19:02.420 | of various dystopias, and increasingly you have movies
00:19:04.940 | like "Her" that tell you a little bit about
00:19:06.500 | maybe more of a little bit utopic vision.
00:19:09.460 | You think about the impacts that we've seen
00:19:12.580 | from being able to have bicycles for our minds
00:19:16.300 | and computers, and I think that the impact
00:19:20.380 | of computers and the internet has just far outstripped
00:19:23.500 | what anyone really could have predicted.
00:19:26.220 | And so I think that it's very clear
00:19:27.460 | that if you can build an AGI,
00:19:29.380 | it will be the most transformative technology
00:19:31.620 | that humans will ever create.
00:19:33.060 | And so what it boils down to then is a question of,
00:19:36.860 | well, is there a path?
00:19:38.700 | Is there hope?
00:19:39.540 | Is there a way to build such a system?
00:19:41.700 | And I think that for 60 or 70 years,
00:19:43.660 | that people got excited and that ended up
00:19:47.340 | not being able to deliver on the hopes
00:19:49.500 | that people had pinned on them.
00:19:51.620 | And I think that then, that after two winters
00:19:54.900 | of AI development, that people, I think,
00:19:57.820 | kind of almost stopped daring to dream,
00:20:00.340 | right, that really talking about AGI
00:20:02.020 | or thinking about AGI became almost this taboo
00:20:04.820 | in the community.
00:20:05.660 | But I actually think that people took the wrong lesson
00:20:08.700 | from AI history.
00:20:10.100 | And if you look back, starting in 1959
00:20:12.420 | is when the Perceptron was released.
00:20:14.260 | And this is basically one of the earliest neural networks.
00:20:17.700 | It was released to what was perceived
00:20:19.260 | as this massive overhype.
00:20:20.860 | So in the New York Times in 1959,
00:20:22.340 | you have this article saying that the Perceptron
00:20:26.420 | will one day recognize people, call out their names,
00:20:29.180 | instantly translate speech between languages.
00:20:31.500 | And people at the time looked at this and said,
00:20:33.860 | this is, your system can't do any of that.
00:20:36.140 | And basically spent 10 years trying to discredit
00:20:38.100 | the whole Perceptron direction and succeeded.
00:20:40.660 | And all the funding dried up and people kind of went
00:20:44.020 | in other directions.
00:20:44.980 | And in the '80s, there was this resurgence.
00:20:46.940 | And I'd always heard that the resurgence in the '80s
00:20:49.300 | was due to the invention of back propagation
00:20:51.500 | and these algorithms that got people excited.
00:20:53.740 | But actually the causality was due to people
00:20:55.780 | building larger computers.
00:20:57.180 | That you can find these articles from the '80s
00:20:59.100 | saying that the democratization of computing power
00:21:01.780 | suddenly meant that you could run
00:21:02.700 | these larger neural networks.
00:21:04.020 | And then people started to do all these amazing things.
00:21:06.300 | Back propagation algorithm was invented.
00:21:08.020 | And the neural nets people were running
00:21:10.140 | were these tiny little 20 neuron neural nets.
00:21:13.100 | What are you supposed to learn with 20 neurons?
00:21:15.220 | And so of course, they weren't able to get great results.
00:21:18.700 | And it really wasn't until 2012 that this approach,
00:21:21.980 | that's almost the most simple, natural approach
00:21:24.700 | that people had come up with in the '50s.
00:21:27.740 | In some ways even in the '40s before there were computers
00:21:30.380 | with the Pitts-McCulloh neuron,
00:21:33.060 | suddenly this became the best way of solving problems.
00:21:37.500 | I think there are three core properties
00:21:39.300 | that deep learning has that I think
00:21:42.100 | are very worth paying attention to.
00:21:44.140 | The first is generality.
00:21:45.940 | We have a very small number of deep learning tools.
00:21:48.740 | SGD, deep neural net, maybe some RL.
00:21:52.340 | And it solves this huge variety of problems.
00:21:55.620 | Speech recognition, machine translation, game playing,
00:21:58.580 | all of these problems, small set of tools.
00:22:01.020 | So there's the generality.
00:22:02.780 | There's a second piece, which is the competence.
00:22:05.020 | You wanna solve any of those problems?
00:22:07.060 | Throw out 40 years worth of normal computer vision research,
00:22:10.660 | replace it with a deep neural net, it's gonna work better.
00:22:13.620 | And there's a third piece, which is the scalability.
00:22:16.900 | One thing that has been shown time and time again
00:22:18.700 | is that if you have a larger neural network,
00:22:21.780 | throw more compute, more data at it, it will work better.
00:22:25.160 | Those three properties together feel like essential parts
00:22:28.900 | of building a general intelligence.
00:22:30.860 | Now it doesn't just mean that if we scale up what we have,
00:22:33.820 | that we will have an AGI.
00:22:35.220 | There are clearly missing pieces, there are missing ideas.
00:22:38.040 | We need to have answers for reasoning.
00:22:40.020 | But I think that the core here is that for the first time,
00:22:44.840 | it feels that we have a paradigm that gives us hope
00:22:47.980 | that general intelligence can be achievable.
00:22:50.620 | And so as soon as you believe that,
00:22:52.180 | everything else comes into focus.
00:22:54.500 | If you imagine that you may be able to,
00:22:56.620 | and that the timeline I think remains uncertain,
00:22:59.900 | but I think that certainly within our lifetimes
00:23:02.220 | and possibly within a much shorter period of time
00:23:04.660 | than people would expect,
00:23:06.580 | if you can really build the most transformative technology
00:23:09.380 | that will ever exist,
00:23:10.660 | you stop thinking about yourself so much.
00:23:12.620 | You start thinking about just like,
00:23:14.260 | how do you have a world where this goes well?
00:23:16.460 | And that you need to think about the practicalities
00:23:18.180 | of how do you build an organization
00:23:19.540 | and get together a bunch of people and resources
00:23:22.020 | and to make sure that people feel motivated
00:23:25.160 | and ready to do it.
00:23:28.060 | But I think that then you start thinking about,
00:23:30.740 | well, what if we succeed?
00:23:32.100 | And how do we make sure that when we succeed,
00:23:34.260 | that the world is actually the place
00:23:35.620 | that we want ourselves to exist in,
00:23:38.260 | and almost in the Rawlsian Vale sense of the word.
00:23:41.060 | And so that's kind of the broader landscape.
00:23:43.900 | And OpenAI was really formed in 2015
00:23:46.700 | with that high level picture of AGI might be possible
00:23:51.500 | sooner than people think,
00:23:52.900 | and that we need to try to do our best
00:23:55.860 | to make sure it's going to go well.
00:23:57.460 | And then we spent the next couple of years
00:23:59.340 | really trying to figure out what does that mean?
00:24:00.860 | How do we do it?
00:24:01.980 | And I think that typically with a company,
00:24:04.780 | you start out very small,
00:24:07.300 | so you and a co-founder and you build a product,
00:24:09.020 | you get some users, you get a product market fit.
00:24:11.100 | Then at some point you raise some money,
00:24:13.300 | you hire people, you scale,
00:24:14.860 | and then down the road,
00:24:16.660 | then the big companies realize you exist
00:24:18.100 | and try to kill you.
00:24:19.100 | And for OpenAI, it was basically everything
00:24:21.540 | in exactly the opposite order.
00:24:23.040 | (laughing)
00:24:25.460 | - Let me just pause for a second.
00:24:26.780 | You said a lot of things,
00:24:27.620 | and let me just admire the jarring aspect
00:24:31.260 | of what OpenAI stands for,
00:24:33.420 | which is daring to dream.
00:24:35.220 | I mean, you said it, it's pretty powerful.
00:24:37.140 | It caught me off guard,
00:24:38.100 | because I think that's very true.
00:24:39.740 | The step of just daring to dream
00:24:44.100 | about the possibilities of creating intelligence
00:24:46.740 | in a positive and a safe way,
00:24:48.780 | but just even creating intelligence
00:24:50.660 | is a much needed refreshing catalyst
00:24:56.340 | for the AI community.
00:24:57.420 | So that's the starting point.
00:24:58.820 | Okay, so then formation of OpenAI.
00:25:01.440 | - I would just say that when we were starting OpenAI,
00:25:05.660 | that kind of the first question that we had is,
00:25:07.780 | is it too late to start a lab
00:25:10.340 | with a bunch of the best people?
00:25:12.020 | Right, is that even possible?
00:25:13.220 | - That was an actual question.
00:25:14.420 | - That was the core question of,
00:25:17.340 | we had this dinner in July of 2015,
00:25:19.340 | and that was really what we spent
00:25:21.020 | the whole time talking about.
00:25:22.420 | And, you know, 'cause it's,
00:25:24.820 | you think about kind of where AI was,
00:25:26.820 | is that it transitioned from being an academic pursuit
00:25:30.240 | to an industrial pursuit.
00:25:32.260 | And so a lot of the best people
00:25:33.320 | were in these big research labs,
00:25:35.360 | and that we wanted to start our own one
00:25:37.040 | that, you know, no matter how much resources
00:25:39.260 | we could accumulate,
00:25:40.580 | would be, you know, pale in comparison
00:25:42.300 | to the big tech companies.
00:25:43.540 | And we knew that.
00:25:44.740 | And there was a question of,
00:25:45.800 | are we going to be actually able
00:25:47.300 | to get this thing off the ground?
00:25:48.740 | You need a critical mass.
00:25:49.760 | You can't just do you and a co-founder, build a product.
00:25:51.940 | Right, you really need to have a group of,
00:25:53.940 | you know, five to 10 people.
00:25:55.620 | And we kind of concluded it wasn't obviously impossible.
00:25:59.500 | So it seemed worth trying.
00:26:00.860 | - Well, you're also a dreamer,
00:26:03.500 | so who knows, right?
00:26:04.820 | - That's right.
00:26:05.660 | - Okay, so speaking of that,
00:26:07.740 | competing with the big players,
00:26:10.520 | let's talk about some of the tricky things
00:26:14.060 | as you think through this process of growing,
00:26:17.460 | of seeing how you can develop these systems
00:26:20.100 | at scale that competes.
00:26:22.640 | So you recently formed OpenAI LP,
00:26:25.700 | a new cap profit company that now carries the name OpenAI.
00:26:30.780 | So OpenAI is now this official company.
00:26:33.300 | The original nonprofit company still exists
00:26:36.500 | and carries the OpenAI nonprofit name.
00:26:39.780 | So can you explain what this company is,
00:26:41.980 | what the purpose of its creation is,
00:26:44.260 | and how did you arrive at the decision to create it?
00:26:48.820 | - OpenAI, the whole entity,
00:26:50.860 | and OpenAI LP as a vehicle,
00:26:53.280 | is trying to accomplish the mission
00:26:55.560 | of ensuring that artificial general intelligence
00:26:57.520 | benefits everyone.
00:26:58.800 | And the main way that we're trying to do that
00:27:00.240 | is by actually trying to build
00:27:01.840 | general intelligence ourselves
00:27:03.240 | and make sure the benefits are distributed to the world.
00:27:05.900 | That's the primary way.
00:27:07.160 | We're also fine if someone else does this, right?
00:27:09.560 | It doesn't have to be us.
00:27:10.640 | If someone else is going to build an AGI
00:27:12.600 | and make sure that the benefits don't get locked up
00:27:14.800 | in one company or, you know, with one set of people,
00:27:19.520 | we're actually fine with that.
00:27:21.120 | And so those ideas are baked into our charter,
00:27:25.360 | which is kind of the foundational document
00:27:28.360 | that describes kind of our values and how we operate.
00:27:31.880 | It's also really baked into the structure of OpenAI LP.
00:27:36.360 | And so the way that we've set up OpenAI LP
00:27:37.920 | is that in the case where we succeed, right,
00:27:42.120 | if we actually build what we're trying to build,
00:27:45.280 | then investors are able to get a return,
00:27:48.320 | but that return is something that is capped.
00:27:50.400 | And so if you think of AGI in terms of the value
00:27:52.960 | that you could really create,
00:27:54.140 | you're talking about the most transformative technology
00:27:56.280 | ever created, it's going to create orders of magnitude
00:27:58.820 | more value than any existing company,
00:28:01.840 | and that all of that value will be owned by the world,
00:28:05.920 | like legally titled to the nonprofit to fulfill that mission.
00:28:09.520 | And so that's the structure.
00:28:12.760 | - So the mission is a powerful one,
00:28:15.160 | and it's one that I think most people would agree with.
00:28:18.880 | It's how we would hope AI progresses.
00:28:22.920 | And so how do you tie yourself to that mission?
00:28:25.360 | How do you make sure you do not deviate from that mission,
00:28:29.200 | that other incentives that are profit-driven
00:28:34.200 | don't interfere with the mission?
00:28:36.760 | - So this was actually a really core question for us
00:28:39.560 | for the past couple of years,
00:28:40.900 | because I'd say that the way that our history went
00:28:43.520 | was that for the first year,
00:28:44.920 | we were getting off the ground, right?
00:28:46.200 | We had this high-level picture,
00:28:47.920 | but we didn't know exactly how we wanted to accomplish it.
00:28:51.840 | And really two years ago is when we first started realizing
00:28:55.000 | in order to build AGI,
00:28:56.120 | we're just going to need to raise way more money
00:28:58.680 | than we can as a nonprofit.
00:29:00.200 | And we're talking many billions of dollars.
00:29:02.800 | And so the first question is,
00:29:05.440 | how are you supposed to do that
00:29:06.840 | and stay true to this mission?
00:29:08.700 | And we looked at every legal structure out there
00:29:10.560 | and concluded none of them were quite right
00:29:11.960 | for what we wanted to do.
00:29:13.400 | And I guess it shouldn't be too surprising
00:29:14.600 | if you're going to do some crazy unprecedented technology
00:29:16.920 | that you're going to have to come
00:29:17.880 | with some crazy unprecedented structure to do it in.
00:29:20.320 | And a lot of our conversation was with people at OpenAI,
00:29:25.320 | the people who really joined
00:29:27.240 | because they believe so much in this mission
00:29:29.120 | and thinking about how do we actually raise the resources
00:29:32.080 | to do it and also stay true to what we stand for.
00:29:35.880 | And the place you got to start is to really align
00:29:37.960 | on what is it that we stand for?
00:29:39.520 | What are those values?
00:29:40.520 | What's really important to us?
00:29:41.800 | And so I'd say that we spent about a year
00:29:43.720 | really compiling the OpenAI charter
00:29:46.240 | and that determines,
00:29:47.520 | and if you even look at the first line item in there,
00:29:50.240 | it says that, look, we expect we're going to have
00:29:51.800 | to marshal huge amounts of resources,
00:29:53.720 | but we're going to make sure
00:29:55.120 | that we minimize conflict of interest with the mission.
00:29:57.600 | And that kind of aligning on all of those pieces
00:30:00.680 | was the most important step towards figuring out
00:30:04.200 | how do we structure a company
00:30:06.000 | that can actually raise the resources
00:30:08.200 | to do what we need to do.
00:30:10.320 | - I imagine OpenAI, the decision to create OpenAILP
00:30:14.760 | was a really difficult one
00:30:16.320 | and there was a lot of discussions,
00:30:17.880 | as you mentioned, for a year
00:30:19.600 | and there was different ideas,
00:30:22.720 | perhaps detractors within OpenAI,
00:30:25.120 | sort of different paths that you could have taken.
00:30:28.900 | What were those concerns?
00:30:30.200 | What were the different paths considered?
00:30:32.040 | What was that process of making that decision like?
00:30:34.080 | - Yep, so if you look actually at the OpenAI charter,
00:30:37.200 | that there's almost two paths embedded within it.
00:30:40.900 | There is, we are primarily trying to build AGI ourselves,
00:30:44.880 | but we're also okay if someone else does it.
00:30:47.320 | And this is a weird thing for a company.
00:30:49.040 | - It's really interesting, actually.
00:30:50.480 | - Yeah.
00:30:51.320 | - There is an element of competition
00:30:53.260 | that you do want to be the one that does it,
00:30:56.640 | but at the same time, you're okay if somebody else does it.
00:30:58.720 | And we'll talk about that a little bit,
00:31:00.200 | that trade-off, that dance, that's really interesting.
00:31:02.920 | - And I think this was the core tension
00:31:04.560 | as we were designing OpenAILP
00:31:06.320 | and really the OpenAI strategy,
00:31:08.200 | is how do you make sure that both you have a shot
00:31:11.040 | at being a primary actor,
00:31:12.600 | which really requires building an organization,
00:31:15.760 | raising massive resources,
00:31:17.640 | and really having the will to go
00:31:19.380 | and execute on some really, really hard vision.
00:31:22.000 | You need to really sign up for a long period
00:31:23.760 | to go and take on a lot of pain and a lot of risk.
00:31:27.120 | And to do that, normally,
00:31:29.680 | you just import the startup mindset, right?
00:31:31.720 | And that you think about, okay,
00:31:32.960 | how do we out-execute everyone?
00:31:34.280 | You have this very competitive angle.
00:31:36.180 | But you also have the second angle of saying that,
00:31:38.160 | well, the true mission isn't for OpenAI to build AGI.
00:31:41.600 | The true mission is for AGI to go well for humanity.
00:31:45.120 | And so how do you take all of those first actions
00:31:48.120 | and make sure you don't close the door on outcomes
00:31:51.320 | that would actually be positive and fulfill the mission?
00:31:54.520 | And so I think it's a very delicate balance, right?
00:31:56.680 | And I think that going 100% one direction or the other
00:31:59.560 | is clearly not the correct answer.
00:32:01.320 | And so I think that even in terms
00:32:02.840 | of just how we talk about OpenAI and think about it,
00:32:05.400 | there's just like one thing
00:32:07.080 | that's always in the back of my mind
00:32:08.520 | is to make sure that we're not just saying
00:32:11.220 | OpenAI's goal is to build AGI, right?
00:32:14.000 | That it's actually much broader than that, right?
00:32:15.560 | That, first of all, it's not just AGI,
00:32:18.240 | it's safe AGI that's very important.
00:32:20.280 | But secondly, our goal isn't to be the ones to build it,
00:32:23.060 | our goal is to make sure it goes well for the world.
00:32:24.640 | And so I think that figuring out
00:32:26.080 | how do you balance all of those
00:32:27.560 | and to get people to really come to the table
00:32:30.220 | and compile a single document
00:32:34.000 | that encompasses all of that wasn't trivial.
00:32:37.520 | - So part of the challenge here is your mission is,
00:32:41.640 | I would say, beautiful, empowering,
00:32:44.200 | and a beacon of hope for people in the research community
00:32:47.480 | and just people thinking about AI.
00:32:49.160 | So your decisions are scrutinized more than,
00:32:53.120 | I think, a regular profit-driven company.
00:32:55.880 | Do you feel the burden of this in the creation of the chart
00:32:58.480 | and just in the way you operate?
00:33:00.160 | - Yes.
00:33:01.000 | (laughing)
00:33:03.000 | - So why do you lean into the burden
00:33:05.880 | by creating such a charter?
00:33:08.640 | Why not keep it quiet?
00:33:10.400 | - I mean, it just boils down to the mission, right?
00:33:12.880 | Like, I'm here and everyone else is here
00:33:15.180 | because we think this is the most important mission.
00:33:17.880 | - Dare to dream.
00:33:19.000 | All right, so do you think you can be good for the world
00:33:23.360 | or create an AGI system that's good
00:33:25.960 | when you're a for-profit company?
00:33:28.320 | From my perspective, I don't understand
00:33:30.680 | why profit interferes with positive impact on society.
00:33:35.680 | I don't understand why Google,
00:33:40.760 | that makes most of its money from ads,
00:33:42.920 | can't also do good for the world
00:33:45.020 | or other companies, Facebook, anything.
00:33:47.480 | I don't understand why those have to interfere.
00:33:50.160 | Profit isn't the thing, in my view,
00:33:55.080 | that affects the impact of a company.
00:33:57.200 | What affects the impact of the company is the charter,
00:34:00.340 | is the culture, is the people inside,
00:34:04.120 | and profit is the thing that just fuels those people.
00:34:07.080 | So what are your views there?
00:34:08.760 | - Yeah, so I think that's a really good question,
00:34:10.880 | and there's some real longstanding debates
00:34:14.160 | in human society that are wrapped up in it.
00:34:16.440 | The way that I think about it is just think about
00:34:18.640 | what are the most impactful non-profits in the world?
00:34:21.460 | What are the most impactful for-profits in the world?
00:34:26.720 | - Right, it's much easier to list the for-profits.
00:34:29.240 | - That's right.
00:34:30.080 | - I think that there's some real truth here
00:34:32.400 | that the system that we set up,
00:34:34.600 | the system for how today's world is organized
00:34:38.280 | is one that really allows for huge impact,
00:34:41.680 | and that part of that is that you need to be,
00:34:45.160 | for-profits are self-sustaining and able
00:34:48.480 | to build on their own momentum.
00:34:51.160 | I think that's a really powerful thing.
00:34:53.020 | It's something that, when it turns out
00:34:55.820 | that we haven't set the guardrails correctly,
00:34:57.860 | causes problems, right?
00:34:58.800 | Think about logging companies that go
00:35:00.120 | and deforest the rainforest.
00:35:02.700 | That's really bad.
00:35:03.660 | We don't want that.
00:35:04.680 | And it's actually really interesting to me
00:35:06.480 | that kind of this question of how do you get
00:35:08.920 | positive benefits out of a for-profit company,
00:35:11.360 | it's actually very similar to how do you get
00:35:13.000 | positive benefits out of an AGI, right?
00:35:15.800 | That you have this very powerful system.
00:35:17.960 | It's more powerful than any human,
00:35:19.680 | and it's kind of autonomous in some ways.
00:35:21.720 | You know, it's superhuman in a lot of axes,
00:35:23.800 | and somehow you have to set the guardrails
00:35:25.400 | to get good things to happen.
00:35:26.800 | But when you do, the benefits are massive.
00:35:29.360 | And so I think that when I think about
00:35:32.480 | non-profit versus for-profit, I think just
00:35:35.140 | not enough happens in non-profits.
00:35:36.720 | They're very pure, but it's just kind of,
00:35:39.160 | it's just hard to do things there.
00:35:40.840 | In for-profits in some ways, like too much happens.
00:35:43.960 | But if kind of shaped in the right way,
00:35:46.440 | it can actually be very positive.
00:35:47.920 | And so with OpenAILP, we're picking a road in between.
00:35:52.120 | Now, the thing that I think is really important
00:35:53.880 | to recognize is that the way that we think about OpenAILP
00:35:57.160 | is that in the world where AGI actually happens, right?
00:36:00.440 | In a world where we are successful,
00:36:01.720 | we build the most transformative technology ever,
00:36:03.800 | the amount of value we're gonna create will be astronomical.
00:36:06.680 | And so then in that case, that the cap that we have
00:36:12.640 | will be a small fraction of the value we create.
00:36:15.560 | And the amount of value that goes back to investors
00:36:17.840 | and employees looks pretty similar to what would happen
00:36:20.000 | in a pretty successful startup.
00:36:23.800 | And that's really the case that we're optimizing for, right?
00:36:26.640 | That we're thinking about in the success case,
00:36:28.680 | making sure that the value we create doesn't get locked up.
00:36:32.160 | And I expect that in other for-profit companies
00:36:35.040 | that it's possible to do something like that.
00:36:37.920 | I think it's not obvious how to do it, right?
00:36:39.840 | I think that as a for-profit company,
00:36:41.560 | you have a lot of fiduciary duty to your shareholders
00:36:44.360 | and that there are certain decisions
00:36:45.760 | that you just cannot make.
00:36:47.640 | In our structure, we've set it up so that
00:36:50.360 | we have a fiduciary duty to the charter.
00:36:52.580 | That we always get to make the decision
00:36:54.440 | that is right for the charter,
00:36:56.760 | rather than even if it comes at the expense
00:36:58.860 | of our own stakeholders.
00:37:00.720 | And so I think that when I think about
00:37:03.440 | what's really important,
00:37:04.400 | it's not really about non-profit versus for-profit.
00:37:06.320 | It's really a question of if you build AGI
00:37:09.640 | and you kind of, you know, humanity's now in this new age,
00:37:13.120 | who benefits?
00:37:14.400 | Whose lives are better?
00:37:15.800 | And I think that what's really important
00:37:17.160 | is to have an answer that is everyone.
00:37:20.320 | - Yeah, which is one of the core aspects of the charter.
00:37:23.380 | So one concern people have, not just with OpenAI,
00:37:26.540 | but with Google, Facebook, Amazon,
00:37:28.420 | anybody really that's creating impact at scale,
00:37:33.420 | is how do we avoid, as your charter says,
00:37:37.700 | avoid enabling the use of AI or AGI
00:37:40.100 | to unduly concentrate power?
00:37:43.660 | Why would not a company like OpenAI
00:37:45.940 | keep all the power of an AGI system to itself?
00:37:48.660 | - The charter?
00:37:49.540 | - The charter.
00:37:50.380 | So, you know, how does the charter
00:37:52.020 | actionalize itself in day-to-day?
00:37:57.260 | - So I think that first, to zoom out, right,
00:38:00.500 | that the way that we structure the company
00:38:01.860 | is so that the power for sort of, you know,
00:38:04.540 | dictating the actions that OpenAI takes
00:38:06.740 | ultimately rests with the board, right?
00:38:08.580 | The board of the non-profit.
00:38:10.660 | And the board is set up in certain ways,
00:38:12.340 | with certain restrictions that you can read about
00:38:14.260 | in the OpenAI LP blog post.
00:38:16.280 | But effectively, the board is the governing body
00:38:19.180 | for OpenAI LP.
00:38:21.180 | And the board has a duty to fulfill the mission
00:38:24.380 | of the non-profit.
00:38:26.380 | And so that's kind of how we tie,
00:38:28.780 | how we thread all these things together.
00:38:30.940 | Now, there's a question of, so day-to-day,
00:38:32.860 | how do people, the individuals,
00:38:34.780 | who in some ways are the most empowered ones, right?
00:38:36.900 | You know, the board sort of gets to call the shots
00:38:38.780 | at the high level,
00:38:39.780 | but the people who are actually executing
00:38:41.880 | are the employees, right?
00:38:43.100 | The people here on a day-to-day basis
00:38:44.820 | who have the, you know, the keys to the technical kingdom.
00:38:48.940 | And there, I think that the answer looks a lot like,
00:38:51.700 | well, how does any company's values get actualized, right?
00:38:55.100 | And I think that a lot of that comes down to
00:38:56.700 | that you need people who are here
00:38:58.120 | because they really believe in that mission,
00:39:01.300 | and they believe in the charter,
00:39:02.780 | and that they are willing to take actions
00:39:05.420 | that maybe are worse for them,
00:39:07.060 | but are better for the charter.
00:39:08.580 | And that's something that's really baked into the culture.
00:39:11.420 | And honestly, I think it's, you know,
00:39:13.180 | I think that that's one of the things
00:39:14.540 | that we really have to work to preserve as time goes on.
00:39:18.140 | And that's a really important part
00:39:19.740 | of how we think about hiring people
00:39:21.620 | and bringing people into OpenAI.
00:39:23.020 | - So there's people here,
00:39:24.660 | there's people here who could speak up and say,
00:39:28.020 | like, hold on a second,
00:39:30.820 | this is totally against what we stand for, culture-wise.
00:39:34.540 | - Yeah, yeah, for sure.
00:39:35.380 | I mean, I think that we actually have,
00:39:37.060 | I think that's like a pretty important part
00:39:38.720 | of how we operate and how we have,
00:39:41.860 | even again, with designing the charter
00:39:44.140 | and designing OpenAILP in the first place,
00:39:46.660 | that there has been a lot of conversation
00:39:48.740 | with employees here,
00:39:49.740 | and a lot of times where employees said, wait a second,
00:39:52.400 | this seems like it's going in the wrong direction,
00:39:53.940 | and let's talk about it.
00:39:55.140 | And so I think one thing that's, I think, a really,
00:39:57.380 | and, you know, here's actually one thing
00:39:58.900 | that I think is very unique about us as a small company,
00:40:02.140 | is that if you're at a massive tech giant,
00:40:04.380 | that's a little bit hard for someone
00:40:05.720 | who's a line employee to go and talk to the CEO
00:40:08.140 | and say, I think that we're doing this wrong.
00:40:10.580 | And, you know, you look at companies like Google
00:40:13.060 | that have had some collective action from employees
00:40:15.740 | to, you know, make ethical change around things like Maven.
00:40:19.420 | And so maybe there are mechanisms
00:40:20.700 | that other companies that work,
00:40:22.260 | but here, super easy for anyone to pull me aside,
00:40:24.500 | to pull Sam aside, to pull Lily aside,
00:40:26.340 | and people do it all the time.
00:40:27.780 | - One of the interesting things in the charter
00:40:29.820 | is this idea that it'd be great
00:40:31.660 | if you could try to describe or untangle
00:40:34.260 | switching from competition to collaboration
00:40:36.460 | in late stage AGI development.
00:40:38.820 | It's really interesting,
00:40:39.780 | this dance between competition and collaboration.
00:40:42.180 | How do you think about that?
00:40:43.420 | - Yeah, assuming that you can actually
00:40:44.860 | do the technical side of AGI development.
00:40:47.060 | I think there's going to be two key problems
00:40:48.980 | with figuring out how do you actually deploy it
00:40:50.420 | and make it go well.
00:40:51.540 | The first one of these is the run-up
00:40:53.180 | to building the first AGI.
00:40:56.380 | You look at how self-driving cars are being developed,
00:40:58.940 | and it's a competitive race.
00:41:00.700 | And the thing that always happens in a competitive race
00:41:02.580 | is that you have huge amounts of pressure
00:41:04.180 | to get rid of safety.
00:41:05.620 | And so that's one thing we're very concerned about, right,
00:41:08.940 | is that people, multiple teams figuring out
00:41:12.020 | we can actually get there,
00:41:13.620 | but if we took the slower path
00:41:16.700 | that is more guaranteed to be safe, we will lose.
00:41:20.260 | And so we're gonna take the fast path.
00:41:22.380 | And so the more that we can, both ourselves,
00:41:25.500 | be in a position where we don't generate
00:41:27.300 | that competitive race, where we say,
00:41:29.020 | if the race is being run and that someone else
00:41:31.540 | is further ahead than we are,
00:41:33.300 | we're not gonna try to leapfrog.
00:41:35.620 | We're gonna actually work with them, right?
00:41:37.220 | We will help them succeed.
00:41:38.820 | As long as what they're trying to do
00:41:40.460 | is to fulfill our mission, then we're good.
00:41:42.940 | We don't have to build AGI ourselves.
00:41:44.780 | And I think that's a really important commitment from us,
00:41:47.060 | but it can't just be unilateral, right?
00:41:49.060 | I think that it's really important
00:41:50.340 | that other players who are serious about building AGI
00:41:53.060 | make similar commitments, right?
00:41:54.620 | And I think that, again, to the extent
00:41:56.980 | that everyone believes that AGI should be something
00:41:58.780 | to benefit everyone, then it actually really shouldn't matter
00:42:01.140 | which company builds it.
00:42:02.380 | And we should all be concerned about the case
00:42:04.060 | where we just race so hard to get there
00:42:06.020 | that something goes wrong.
00:42:07.580 | - So what role do you think government,
00:42:10.500 | our favorite entity, has in setting policy and rules
00:42:13.780 | about this domain, from research to the development
00:42:18.260 | to early stage to late stage AI and AGI development?
00:42:22.860 | - So I think that, first of all,
00:42:25.620 | it's really important that government's in there, right?
00:42:28.060 | In some way, shape, or form.
00:42:29.780 | At the end of the day, we're talking about
00:42:30.900 | building technology that will shape how the world operates
00:42:35.060 | and that there needs to be government
00:42:37.260 | as part of that answer.
00:42:39.020 | And so that's why we've done a number
00:42:42.180 | of different congressional testimonies,
00:42:43.620 | we interact with a number of different lawmakers,
00:42:46.260 | and right now, a lot of our message to them
00:42:50.000 | is that it's not the time for regulation,
00:42:54.300 | it is the time for measurement, right?
00:42:56.380 | That our main policy recommendation is that people,
00:42:59.020 | and the government does this all the time
00:43:00.640 | with bodies like NIST, spend time trying to figure out
00:43:04.860 | just where the technology is, how fast it's moving,
00:43:07.900 | and can really become literate and up to speed
00:43:11.180 | with respect to what to expect.
00:43:13.460 | So I think that today, the answer really
00:43:15.220 | is about measurement, and I think that there will be
00:43:18.300 | a time and place where that will change.
00:43:21.700 | And I think it's a little bit hard to predict exactly
00:43:24.820 | what exactly that trajectory should look like.
00:43:27.100 | - So there will be a point at which regulation,
00:43:31.020 | federal in the United States, the government steps in
00:43:34.180 | and helps be the, I don't wanna say the adult in the room,
00:43:39.180 | to make sure that there is strict rules,
00:43:42.380 | maybe conservative rules that nobody can cross.
00:43:45.220 | - Well, I think there's kind of maybe two angles to it.
00:43:47.420 | So today, with narrow AI applications,
00:43:49.780 | that I think there are already existing bodies
00:43:51.940 | that are responsible and should be responsible
00:43:53.940 | for regulation, you think about, for example,
00:43:55.840 | with self-driving cars, that you want the National Highway--
00:43:59.540 | - NHTSA. - Exactly, to be regulated,
00:44:02.420 | and that makes sense, right?
00:44:04.040 | That basically what we're saying is that we're going
00:44:05.380 | to have these technological systems
00:44:08.140 | that are going to be performing applications
00:44:10.620 | that humans already do, great.
00:44:12.700 | We already have ways of thinking about standards
00:44:14.820 | and safety for those.
00:44:16.140 | So I think actually empowering those regulators today
00:44:18.860 | is also pretty important.
00:44:20.020 | And then I think for AGI, that there's going to be a point
00:44:24.740 | where we'll have better answers,
00:44:26.000 | and I think that maybe a similar approach
00:44:27.580 | of first measurement and start thinking about
00:44:30.500 | what the rules should be.
00:44:31.620 | I think it's really important that we don't
00:44:33.900 | prematurely squash progress.
00:44:36.260 | I think it's very easy to kind of smother a budding field,
00:44:40.140 | and I think that's something to really avoid.
00:44:42.140 | But I don't think that the right way of doing it
00:44:43.740 | is to say, let's just try to blaze ahead
00:44:46.900 | and not involve all these other stakeholders.
00:44:50.280 | - So you recently released a paper
00:44:54.780 | on GPT-2 language modeling,
00:44:57.380 | but did not release the full model
00:45:02.040 | because you had concerns about the possible
00:45:04.380 | negative effects of the availability of such model.
00:45:07.460 | It's outside of just that decision,
00:45:10.700 | it's super interesting because of the discussion
00:45:14.620 | at a societal level, the discourse it creates.
00:45:16.980 | So it's fascinating in that aspect.
00:45:19.260 | But if you think, that's the specifics here at first,
00:45:22.860 | what are some negative effects that you envisioned?
00:45:25.900 | And of course, what are some of the positive effects?
00:45:28.540 | - Yeah, so again, I think to zoom out,
00:45:30.580 | like the way that we thought about GPT-2
00:45:34.000 | is that with language modeling,
00:45:35.760 | we are clearly on a trajectory right now
00:45:38.520 | where we scale up our models
00:45:40.860 | and we get qualitatively better performance.
00:45:44.460 | GPT-2 itself was actually just a scale up
00:45:47.320 | of a model that we've released in the previous June.
00:45:50.660 | We just ran it at a much larger scale
00:45:52.840 | and we got these results where
00:45:54.300 | suddenly starting to write coherent pros,
00:45:57.220 | which was not something we'd seen previously.
00:46:00.020 | And what are we doing now?
00:46:01.340 | Well, we're gonna scale up GPT-2 by 10X, by 100X,
00:46:04.620 | by 1000X and we don't know what we're gonna get.
00:46:07.860 | And so it's very clear that the model
00:46:10.100 | that we released last June,
00:46:12.840 | I think it's kind of like, it's a good academic toy.
00:46:16.460 | It's not something that we think is something
00:46:18.900 | that can really have negative applications
00:46:20.440 | or to the extent that it can,
00:46:21.700 | that the positive of people being able to play with it
00:46:24.340 | is far outweighs the possible harms.
00:46:28.300 | You fast forward to not GPT-2, but GPT-20
00:46:32.580 | and you think about what that's gonna be like.
00:46:34.700 | And I think that the capabilities
00:46:36.740 | are going to be substantive.
00:46:38.220 | And so there needs to be a point in between the two
00:46:41.140 | where you say, this is something
00:46:43.460 | where we are drawing the line
00:46:45.180 | and that we need to start thinking about the safety aspects.
00:46:48.020 | And I think for GPT-2, we could have gone either way.
00:46:50.180 | And in fact, when we had conversations internally
00:46:52.740 | that we had a bunch of pros and cons
00:46:54.760 | and it wasn't clear which one outweighed the other.
00:46:58.160 | And I think that when we announced that,
00:46:59.960 | hey, we decide not to release this model,
00:47:02.160 | then there was a bunch of conversation
00:47:03.600 | where various people said it's so obvious
00:47:05.200 | that you should have just released it.
00:47:06.400 | There are other people said it's so obvious
00:47:07.540 | you should not have released it.
00:47:08.840 | And I think that that almost definitionally means
00:47:10.960 | that holding it back was the correct decision.
00:47:13.800 | If it's not obvious whether something is beneficial or not,
00:47:17.660 | you should probably default to caution.
00:47:19.720 | And so I think that the overall landscape
00:47:22.440 | for how we think about it
00:47:23.720 | is that this decision could have gone either way.
00:47:25.920 | There are great arguments in both directions,
00:47:27.920 | but for future models down the road
00:47:30.040 | and possibly sooner than you'd expect,
00:47:32.280 | 'cause scaling these things up
00:47:33.440 | doesn't actually take that long.
00:47:35.680 | Those ones you're definitely not going to want
00:47:37.880 | to release into the wild.
00:47:39.560 | And so I think that we almost view this as a test case
00:47:42.600 | and to see, can we even design,
00:47:45.280 | how do you have a society,
00:47:46.600 | or how do you have a system that goes
00:47:48.280 | from having no concept of responsible disclosure
00:47:50.480 | where the mere idea of not releasing something
00:47:53.400 | for safety reasons is unfamiliar,
00:47:55.960 | to a world where you say, okay, we have a powerful model.
00:47:58.720 | Let's at least think about it.
00:47:59.720 | Let's go through some process.
00:48:01.280 | And you think about the security community,
00:48:02.680 | it took them a long time
00:48:03.880 | to design responsible disclosure.
00:48:06.000 | You think about this question of,
00:48:07.200 | well, I have a security exploit.
00:48:08.800 | I send it to the company.
00:48:09.760 | The company is like, tries to prosecute me
00:48:12.000 | or just ignores it.
00:48:13.920 | What do I do?
00:48:16.080 | And so the alternatives of,
00:48:17.320 | oh, I just always publish my exploits,
00:48:19.120 | that doesn't seem good either.
00:48:19.960 | And so it really took a long time
00:48:21.600 | and it was bigger than any individual.
00:48:25.320 | It's really about building a whole community
00:48:27.040 | that believe that, okay, we'll have this process
00:48:28.720 | where you send it to the company.
00:48:30.120 | If they don't act in a certain time,
00:48:31.680 | then you can go public and you're not a bad person.
00:48:34.400 | You've done the right thing.
00:48:36.200 | And I think that in AI,
00:48:38.640 | part of the response at GPD 2 just proves
00:48:41.360 | that we don't have any concept of this.
00:48:43.320 | So that's the high level picture.
00:48:46.760 | And so I think this was a really important move to make
00:48:51.200 | and we could have maybe delayed it for GPT 3,
00:48:53.960 | but I'm really glad we did it for GPT 2.
00:48:56.040 | And so now you look at GPT 2 itself
00:48:57.720 | and you think about the substance of, okay,
00:48:59.400 | what are potential negative applications?
00:49:01.280 | So you have this model that's been trained on the internet,
00:49:04.080 | which is also going to be a bunch of very biased data,
00:49:06.480 | a bunch of very offensive content in there.
00:49:09.560 | And you can ask it to generate content for you
00:49:13.200 | on basically any topic, right?
00:49:14.560 | You just give it a prompt and it'll just start writing
00:49:16.760 | and it writes content like you see on the internet,
00:49:19.280 | even down to like saying advertisement
00:49:21.840 | in the middle of some of its generations.
00:49:24.160 | And you think about the possibilities
00:49:26.160 | for generating fake news or abusive content.
00:49:29.240 | And it's interesting seeing what people have done with,
00:49:31.840 | we released a smaller version of GPT 2
00:49:34.360 | and the people have done things like try to generate,
00:49:37.360 | take my own Facebook message history
00:49:40.720 | and generate more Facebook messages like me
00:49:43.320 | and people generating fake politician content
00:49:47.320 | or there's a bunch of things there
00:49:49.480 | where you at least have to think,
00:49:51.880 | is this going to be good for the world?
00:49:54.720 | There's the flip side, which is I think
00:49:56.280 | that there's a lot of awesome applications
00:49:57.800 | that we really want to see like creative applications
00:50:01.600 | in terms of if you have sci-fi authors
00:50:03.960 | that can work with this tool and come with cool ideas,
00:50:06.720 | like that seems awesome if we can write better sci-fi
00:50:09.680 | through the use of these tools.
00:50:11.280 | And we've actually had a bunch of people writing to us
00:50:13.040 | asking, hey, can we use it for a variety
00:50:16.560 | of different creative applications?
00:50:18.320 | - So the positive are actually pretty easy to imagine.
00:50:21.400 | The usual NLP applications are really interesting,
00:50:28.800 | but let's go there.
00:50:30.920 | It's kind of interesting to think about a world where,
00:50:34.240 | look at Twitter, where not just fake news,
00:50:37.880 | but smarter and smarter bots being able to spread
00:50:43.000 | in an interesting, complex networking way information
00:50:47.320 | that just floods out us regular human beings
00:50:50.680 | with our original thoughts.
00:50:52.760 | So what are your views of this world with GPT-20?
00:50:57.760 | How do we think about it?
00:51:01.200 | Again, it's like one of those things about in the '50s
00:51:03.480 | trying to describe the internet or the smartphone.
00:51:08.480 | What do you think about that world,
00:51:09.880 | the nature of information?
00:51:12.840 | - One possibility is that we'll always try to design systems
00:51:16.720 | that identify robot versus human,
00:51:19.640 | and we'll do so successfully.
00:51:21.240 | And so we'll authenticate that we're still human.
00:51:24.600 | And the other world is that we just accept the fact
00:51:27.520 | that we're swimming in a sea of fake news
00:51:30.360 | and just learn to swim there.
00:51:32.200 | - Well, have you ever seen the,
00:51:33.800 | there's a popular meme of a robot
00:51:38.800 | with a physical arm and pen clicking
00:51:41.840 | the I'm not a robot button?
00:51:43.480 | - Yeah. (laughs)
00:51:45.280 | - I think that the truth is that really trying
00:51:48.080 | to distinguish between robot and human is a losing battle.
00:51:52.200 | - Ultimately, you think it's a losing battle.
00:51:53.840 | - I think it's a losing battle, ultimately.
00:51:55.560 | I think that that is, in terms of the content,
00:51:57.840 | in terms of the actions that you can take.
00:51:59.400 | I mean, think about how captures have gone.
00:52:01.240 | The captures used to be a very nice, simple,
00:52:02.960 | you just have this image.
00:52:04.760 | All of our OCR is terrible.
00:52:06.360 | You put a couple of artifacts in it,
00:52:08.920 | humans are gonna be able to tell what it is.
00:52:11.520 | An AI system wouldn't be able to.
00:52:13.320 | Today, I could barely do captures.
00:52:15.760 | And I think that this is just kind of where we're going.
00:52:18.400 | I think captures were a moment in time thing.
00:52:20.440 | And as AI systems become more powerful,
00:52:22.520 | that there being human capabilities
00:52:24.840 | that can be measured in a very easy, automated way
00:52:27.600 | that AIs will not be capable of.
00:52:30.200 | I think that's just like,
00:52:31.160 | it's just an increasingly hard technical battle.
00:52:34.200 | But it's not that all hope is lost, right?
00:52:36.280 | You think about how do we already authenticate ourselves?
00:52:41.080 | We have systems, we have social security numbers,
00:52:43.480 | if you're in the US,
00:52:44.320 | or you have ways of identifying individual people.
00:52:48.800 | And having real world identity tied to digital identity
00:52:51.960 | seems like a step towards authenticating
00:52:55.440 | the source of content rather than the content itself.
00:52:58.320 | Now, there are problems with that.
00:53:00.080 | How can you have privacy and anonymity
00:53:02.400 | in a world where the only content you can really trust is,
00:53:05.520 | or the only way you can trust content
00:53:06.640 | is by looking at where it comes from.
00:53:08.640 | And so I think that building out good reputation networks
00:53:11.400 | may be one possible solution.
00:53:14.080 | But yeah, I think that this question is not an obvious one.
00:53:17.680 | And I think that we,
00:53:19.280 | maybe sooner than we think we'll be in a world where,
00:53:21.320 | today, I often will read a tweet and be like,
00:53:24.320 | hmm, do I feel like a real human wrote this?
00:53:25.920 | Or do I feel like this is genuine?
00:53:27.520 | I feel like I can kind of judge the content a little bit.
00:53:30.120 | And I think in the future, it just won't be the case.
00:53:32.600 | You look at, for example,
00:53:33.760 | the FCC comments on net neutrality.
00:53:36.840 | It came out later that millions of those were auto-generated
00:53:39.840 | and that the researchers were able
00:53:41.440 | to do various statistical techniques to do that.
00:53:44.000 | What do you do in a world
00:53:45.080 | where those statistical techniques don't exist?
00:53:47.680 | It's just impossible to tell the difference
00:53:49.120 | between humans and AIs.
00:53:50.640 | And in fact, the most persuasive arguments
00:53:53.960 | are written by AI.
00:53:56.580 | All that stuff, it's not sci-fi anymore.
00:53:58.600 | You look at GPT-2 making a great argument
00:54:00.560 | for why recycling is bad for the world.
00:54:02.560 | You gotta read that and be like, huh, you're right.
00:54:04.440 | We are addressing different symptoms.
00:54:06.520 | - Yeah, that's quite interesting.
00:54:08.120 | I mean, ultimately it boils down to the physical world
00:54:11.360 | being the last frontier of proving,
00:54:13.680 | so you said like basically networks of people,
00:54:16.080 | humans vouching for humans in the physical world.
00:54:19.400 | And somehow the authentication ends there.
00:54:22.960 | I mean, if I had to ask you,
00:54:24.520 | I mean, you're way too eloquent for a human.
00:54:28.160 | So if I had to ask you to authenticate,
00:54:31.240 | like prove how do I know you're not a robot
00:54:33.120 | and how do you know I'm not a robot?
00:54:35.000 | I think that's, so far, in this space,
00:54:40.000 | this conversation we just had,
00:54:42.160 | the physical movements we did,
00:54:44.040 | is the biggest gap between us and AI systems
00:54:47.080 | is the physical manipulation.
00:54:49.400 | So maybe that's the last frontier.
00:54:51.360 | - Well, here's another question is,
00:54:53.080 | why is solving this problem important?
00:54:57.480 | What aspects are really important to us?
00:54:59.160 | I think that probably where we'll end up
00:55:01.240 | is we'll hone in on what do we really want
00:55:03.640 | out of knowing if we're talking to a human.
00:55:06.440 | And I think that, again, this comes down to identity.
00:55:09.480 | And so I think that the internet of the future,
00:55:11.800 | I expect to be one that will have lots of agents out there
00:55:14.880 | that will interact with you.
00:55:16.360 | But I think that the question of is this
00:55:18.640 | real flesh and blood human,
00:55:21.560 | or is this an automated system,
00:55:23.840 | may actually just be less important.
00:55:25.840 | - Let's actually go there.
00:55:27.440 | It's GPT-2 is impressive, and let's look at GPT-20.
00:55:32.480 | Why is it so bad that all my friends are GPT-20?
00:55:37.480 | Why is it so important on the internet,
00:55:43.320 | do you think, to interact with only human beings?
00:55:47.360 | Why can't we live in a world where ideas can come
00:55:50.640 | from models trained on human data?
00:55:52.960 | - Yeah, I think this is actually
00:55:54.840 | a really interesting question.
00:55:55.720 | This comes back to the how do you even picture a world
00:55:58.100 | with some new technology?
00:55:59.580 | And I think that one thing that I think is important
00:56:02.080 | is, let's say, honesty.
00:56:04.760 | And I think that if you have,
00:56:06.880 | almost in the Turing test style sense of technology,
00:56:11.080 | you have AIs that are pretending to be humans
00:56:13.200 | and deceiving you.
00:56:14.360 | I think that feels like a bad thing.
00:56:17.560 | I think that it's really important
00:56:18.880 | that we feel like we're in control of our environment,
00:56:21.280 | that we understand who we're interacting with.
00:56:23.400 | And if it's an AI or a human,
00:56:25.920 | that's not something that we're being deceived about.
00:56:28.680 | But I think that the flip side of can I have as meaningful
00:56:31.480 | of an interaction with an AI as I can with a human?
00:56:34.240 | Well, I actually think here you can turn to sci-fi.
00:56:36.880 | And her, I think, is a great example
00:56:39.040 | of asking this very question.
00:56:41.120 | One thing I really love about her
00:56:42.200 | is it really starts out almost by asking
00:56:44.320 | how meaningful are human virtual relationships?
00:56:47.280 | And then you have a human who has a relationship
00:56:50.520 | with an AI and that you really start
00:56:53.160 | to be drawn into that,
00:56:54.360 | that all of your emotional buttons get triggered
00:56:56.960 | in the same way as if there was a real human
00:56:58.520 | that was on the other side of that phone.
00:57:00.680 | And so I think that this is one way of thinking about it
00:57:03.800 | is that I think that we can have meaningful interactions
00:57:07.160 | and that if there's a funny joke,
00:57:09.760 | sometimes it doesn't really matter
00:57:10.840 | if it was written by a human or an AI.
00:57:12.920 | But what you don't want,
00:57:14.080 | and where I think we should really draw hard lines,
00:57:16.400 | is deception.
00:57:17.360 | And I think that as long as we're in a world
00:57:19.560 | where, why do we build AI systems at all?
00:57:22.640 | The reason we want to build them is to enhance human lives,
00:57:25.000 | to make humans be able to do more things,
00:57:26.680 | to have humans feel more fulfilled.
00:57:29.040 | And if we can build AI systems that do that, sign me up.
00:57:33.160 | - So the process of language modeling,
00:57:35.120 | how far do you think it'd take us?
00:57:38.760 | Let's look at movie "Her."
00:57:40.660 | Do you think a dialogue, natural language conversation
00:57:45.000 | as formulated by the Turing test, for example,
00:57:47.800 | do you think that process could be achieved
00:57:50.400 | through this kind of unsupervised language modeling?
00:57:53.120 | - So I think the Turing test in its real form
00:57:56.920 | isn't just about language.
00:57:58.640 | It's really about reasoning too.
00:58:00.720 | To really pass the Turing test,
00:58:01.880 | I should be able to teach calculus
00:58:03.880 | to whoever's on the other side,
00:58:05.540 | and have it really understand calculus,
00:58:07.480 | and be able to go and solve new calculus problems.
00:58:11.280 | And so I think that to really solve the Turing test,
00:58:13.960 | we need more than what we're seeing with language models.
00:58:16.420 | We need some way of plugging in reasoning.
00:58:18.680 | Now, how different will that be from what we already do?
00:58:22.380 | That's an open question.
00:58:23.840 | It might be that we need some sequence
00:58:25.480 | of totally radical new ideas,
00:58:27.200 | or it might be that we just need to shape
00:58:29.560 | our existing systems in a slightly different way.
00:58:31.960 | But I think that in terms of how far
00:58:34.600 | language modeling will go,
00:58:35.880 | it's already gone way further
00:58:37.480 | than many people would have expected.
00:58:39.720 | I think that things like,
00:58:40.920 | and I think there's a lot of really interesting angles
00:58:42.680 | to poke in terms of how much does GPT-2
00:58:45.880 | understand physical world?
00:58:47.880 | Like, you read a little bit about fire underwater in GPT-2,
00:58:52.320 | so it's like, okay, maybe it doesn't quite understand
00:58:54.160 | what these things are.
00:58:55.640 | But at the same time, I think that you also see
00:58:58.560 | various things like smoke coming from flame
00:59:00.640 | and a bunch of these things that GPT-2, it has no body,
00:59:03.680 | it has no physical experience,
00:59:04.880 | it's just statically read data.
00:59:07.280 | And I think that the answer is like, we don't know yet.
00:59:12.280 | And these questions though,
00:59:14.600 | we're starting to be able to actually ask them
00:59:16.220 | to physical systems, to real systems that exist,
00:59:18.720 | and that's very exciting.
00:59:19.880 | - Do you think, what's your intuition?
00:59:21.160 | Do you think if you just scale language modeling,
00:59:25.440 | like significantly scale, that reasoning can emerge
00:59:29.320 | from the same exact mechanisms?
00:59:31.280 | - I think it's unlikely that if we just scale GPT-2
00:59:34.920 | that we'll have reasoning in the full-fledged way.
00:59:38.560 | And I think that there's like,
00:59:39.760 | the type signature is a little bit wrong, right?
00:59:41.480 | That like, there's something we do with,
00:59:44.520 | that we call thinking, right?
00:59:45.760 | Where we spend a lot of compute,
00:59:47.620 | like a variable amount of compute
00:59:49.120 | to get to better answers, right?
00:59:50.640 | I think a little bit harder, I get a better answer.
00:59:53.000 | And that that kind of type signature
00:59:55.160 | isn't quite encoded in a GPT, right?
00:59:58.880 | GPT will kind of like, it's been a long time,
01:00:01.880 | and it's like evolutionary history,
01:00:03.600 | baking in all this information,
01:00:04.680 | getting very, very good at this predictive process.
01:00:07.000 | And then at runtime, I just kind of do one forward pass
01:00:10.320 | and am able to generate stuff.
01:00:13.240 | And so, there might be small tweaks
01:00:15.560 | to what we do in order to get the type signature, right?
01:00:18.020 | For example, well, it's not really one forward pass, right?
01:00:21.000 | You generate symbol by symbol.
01:00:22.600 | And so, maybe you generate like a whole sequence of thoughts
01:00:25.600 | and you only keep like the last bit or something.
01:00:28.240 | But I think that at the very least,
01:00:29.880 | I would expect you have to make changes like that.
01:00:32.200 | - Yeah, just exactly how you said,
01:00:34.760 | think is the process of generating thought by thought
01:00:38.440 | in the same kind of way, like you said,
01:00:40.400 | keep the last bit, the thing that we converge towards.
01:00:43.640 | - Yep.
01:00:45.040 | And I think there's another piece which is interesting,
01:00:47.320 | which is this out of distribution generalization, right?
01:00:50.280 | That like thinking somehow lets us do that, right?
01:00:52.640 | That we haven't experienced a thing
01:00:54.440 | and yet somehow we just kind of keep refining
01:00:56.080 | our mental model of it.
01:00:58.080 | This is again, something that feels tied
01:01:00.640 | to whatever reasoning is.
01:01:03.380 | And maybe it's a small tweak to what we do.
01:01:05.720 | Maybe it's many ideas and will take us many decades.
01:01:08.100 | - Yeah, so the assumption there,
01:01:09.940 | generalization out of distribution
01:01:13.260 | is that it's possible to create new ideas.
01:01:18.200 | It's possible that nobody's ever created any new ideas.
01:01:20.840 | And then with scaling GPT-2 to GPT-20,
01:01:24.840 | you would essentially generalize to all possible thoughts
01:01:29.840 | that us humans can have.
01:01:31.560 | Just to play devil's advocate.
01:01:34.800 | - I mean, how many new story ideas
01:01:37.280 | have we come up with since Shakespeare, right?
01:01:39.120 | - Yeah, exactly.
01:01:40.200 | It's just all different forms of love and drama and so on.
01:01:44.680 | Okay.
01:01:45.800 | Not sure if you read "Bitter Lesson,"
01:01:47.520 | a recent blog post by Ray Sutton.
01:01:49.400 | - Yep, I have.
01:01:50.880 | - He basically says something that echoes
01:01:53.720 | some of the ideas that you've been talking about,
01:01:55.480 | which is, he says the biggest lesson
01:01:58.320 | that can be read from 70 years of AI research
01:02:00.680 | is that general methods that leverage computation
01:02:03.880 | are ultimately going to ultimately win out.
01:02:07.920 | Do you agree with this?
01:02:08.960 | So basically, open AI in general,
01:02:12.840 | but the ideas you're exploring
01:02:14.240 | about coming up with methods,
01:02:15.880 | whether it's GPT-2 modeling
01:02:17.720 | or whether it's open AI-5 playing Dota,
01:02:21.160 | where a general method is better
01:02:24.000 | than a more fine-tuned, expert-tuned method.
01:02:27.440 | - Yeah, so I think that,
01:02:31.360 | well, one thing that I think was really interesting
01:02:32.880 | about the reaction to that blog post
01:02:35.180 | was that a lot of people have read this
01:02:36.480 | as saying that compute is all that matters.
01:02:39.440 | And that's a very threatening idea, right?
01:02:41.360 | And I don't think it's a true idea either.
01:02:43.720 | It's very clear that we have algorithmic ideas
01:02:45.800 | that have been very important for making progress.
01:02:47.880 | And to really build AGI,
01:02:49.520 | you wanna push as far as you can on the computational scale
01:02:52.120 | and you wanna push as far as you can on human ingenuity.
01:02:55.600 | And so I think you need both.
01:02:57.040 | But I think the way that you phrased the question
01:02:58.320 | is actually very good, right?
01:02:59.640 | That it's really about what kind of ideas
01:03:02.240 | should we be striving for?
01:03:04.040 | And absolutely, if you can find a scalable idea,
01:03:07.620 | you pour more data into it, it gets better.
01:03:11.400 | Like, that's the real holy grail.
01:03:13.800 | And so I think that the answer to the question,
01:03:16.600 | I think, is yes.
01:03:18.160 | That that's really how we think about it.
01:03:19.920 | And that part of why we're excited
01:03:21.840 | about the power of deep learning,
01:03:23.320 | the potential for building AGI,
01:03:25.320 | is because we look at the systems that exist
01:03:27.600 | in the most successful AI systems
01:03:29.720 | and we realize that you scale those up,
01:03:32.680 | they're gonna work better.
01:03:34.000 | And I think that that scalability
01:03:35.820 | is something that really gives us hope
01:03:37.080 | for being able to build transformative systems.
01:03:39.560 | - So I'll tell you, this is partially an emotional,
01:03:43.160 | you know, a thing that response that people often have
01:03:45.720 | is computers so important for state of the art performance.
01:03:49.160 | You know, individual developers,
01:03:50.720 | maybe a 13 year old sitting somewhere in Kansas
01:03:52.920 | or something like that, you know,
01:03:54.440 | they're sitting, they might not even have a GPU
01:03:56.960 | or may have a single GPU, a 1080 or something like that.
01:04:00.080 | And there's this feeling like, well,
01:04:02.620 | how can I possibly compete or contribute
01:04:05.760 | to this world of AI if scale is so important?
01:04:09.800 | So if you can comment on that,
01:04:11.880 | and in general, do you think we need to also
01:04:14.280 | in the future focus on democratizing compute resources
01:04:18.760 | more or as much as we democratize the algorithms?
01:04:22.640 | - Well, so the way that I think about it
01:04:23.940 | is that there's this space of possible progress, right?
01:04:28.840 | There's a space of ideas and sort of systems
01:04:30.880 | that will work that will move us forward.
01:04:32.920 | And there's a portion of that space
01:04:34.800 | and to some extent, an increasingly significant portion
01:04:37.040 | of that space that does just require
01:04:38.800 | massive compute resources.
01:04:41.040 | And for that, I think that the answer is kind of clear
01:04:44.720 | and that part of why we have the structure that we do
01:04:47.920 | is because we think it's really important
01:04:49.640 | to be pushing the scale and to be, you know,
01:04:51.480 | building these large clusters and systems.
01:04:53.800 | But there's another portion of the space
01:04:55.900 | that isn't about the large scale compute
01:04:57.880 | that are these ideas that, and again,
01:04:59.960 | I think that for the ideas to really be impactful
01:05:02.200 | and really shine, that they should be ideas
01:05:04.200 | that if you scale them up, would work way better
01:05:06.660 | than they do at small scale.
01:05:08.800 | But that you can discover them
01:05:10.480 | without massive computational resources.
01:05:12.720 | And if you look at the history of recent developments,
01:05:15.160 | you think about things like the GAN or the VAE,
01:05:17.640 | that these are ones that I think you could come up with them
01:05:20.880 | without having, and you know, in practice,
01:05:22.680 | people did come up with them without having
01:05:24.480 | massive, massive computational resources.
01:05:26.520 | - Right, I just talked to Ian Goodfellow,
01:05:27.960 | but the thing is, the initial GAN
01:05:31.560 | produced pretty terrible results, right?
01:05:34.160 | So only because it was in a very specific,
01:05:36.280 | it was only because they're smart enough to know
01:05:38.620 | that this is quite surprising
01:05:39.960 | it can generate anything that they know.
01:05:43.160 | Do you see a world, or is that too optimistic
01:05:45.480 | and dreamer-like to imagine that the compute resources
01:05:49.760 | are something that's owned by governments
01:05:52.200 | and provided as utility?
01:05:55.040 | - Actually, to some extent, this question reminds me
01:05:57.120 | of a blog post from one of my former professors at Harvard,
01:06:01.160 | this guy, Matt Welsh, who was a systems professor.
01:06:03.760 | I remember sitting in his tenure talk, right,
01:06:05.280 | and he had literally just gotten tenure.
01:06:08.780 | He went to Google for the summer,
01:06:10.940 | and then decided he wasn't going back to academia, right?
01:06:15.700 | And kind of in his blog post, he makes this point
01:06:17.740 | that, look, as a systems researcher,
01:06:20.780 | that I come up with these cool system ideas, right,
01:06:23.180 | and I kind of build a little proof of concept,
01:06:25.060 | and the best thing I could hope for
01:06:27.060 | is that the people at Google or Yahoo,
01:06:30.100 | which was around at the time,
01:06:31.540 | will implement it and actually make it work at scale,
01:06:35.100 | right, that's like the dream for me, right?
01:06:36.560 | I build the little thing, and they turn it
01:06:37.920 | into the big thing that's actually working.
01:06:39.960 | And for him, he said, I'm done with that.
01:06:43.320 | I wanna be the person who's actually doing,
01:06:45.280 | building and deploying.
01:06:47.240 | And I think that there's a similar dichotomy here, right?
01:06:49.520 | I think that there are people
01:06:50.480 | who really actually find value,
01:06:53.320 | and I think it is a valuable thing to do,
01:06:55.160 | to be the person who produces those ideas, right,
01:06:57.380 | who builds the proof of concept.
01:06:58.800 | And yeah, you don't get to generate
01:07:00.520 | the coolest possible GAN images,
01:07:02.720 | but you invented the GAN, right?
01:07:04.440 | And so, there's a real trade-off there.
01:07:07.560 | And I think that that's a very personal choice,
01:07:09.020 | but I think there's value in both sides.
01:07:10.840 | - Do you think creating AGI, something,
01:07:14.600 | or some new models,
01:07:16.700 | we would see echoes of the brilliance
01:07:20.440 | even at the prototype level?
01:07:22.240 | So you would be able to develop those ideas without scale,
01:07:24.920 | the initial seeds?
01:07:27.280 | - So take a look at,
01:07:29.000 | I always like to look at examples that exist, right?
01:07:31.760 | Look at real precedent.
01:07:32.680 | And so take a look at the June 2018 model
01:07:36.200 | that we released, that we scaled up to turn into GPT-2.
01:07:39.160 | And you can see that at small scale,
01:07:41.240 | it set some records, right?
01:07:42.760 | This was the original GPT.
01:07:44.760 | We actually had some cool generations
01:07:46.800 | that weren't nearly as amazing and really stunning
01:07:49.800 | as the GPT-2 ones, but it was promising.
01:07:51.960 | It was interesting.
01:07:53.000 | And so I think it is the case
01:07:54.480 | that with a lot of these ideas,
01:07:56.080 | that you see promise at small scale.
01:07:58.240 | But there is an asterisk here, a very big asterisk,
01:08:00.780 | which is sometimes we see behaviors that emerge
01:08:05.200 | that are qualitatively different
01:08:07.240 | from anything we saw at small scale.
01:08:09.040 | And that the original inventor
01:08:10.600 | of whatever algorithm looks at it and says,
01:08:13.560 | "I didn't think it could do that."
01:08:15.480 | This is what we saw in Dota, right?
01:08:17.400 | So PPO was created by John Shulman,
01:08:19.320 | who's a researcher here.
01:08:20.520 | And with Dota, we basically just ran PPO
01:08:24.640 | at massive, massive scale.
01:08:26.480 | And there's some tweaks in order to make it work,
01:08:29.080 | but fundamentally it's PPO at the core.
01:08:31.500 | And we were able to get this long-term planning,
01:08:35.260 | these behaviors to really play out on a timescale
01:08:38.660 | that we just thought was not possible.
01:08:40.740 | And John looked at that and was like,
01:08:42.580 | "I didn't think it could do that."
01:08:44.200 | That's what happens when you're at three orders
01:08:45.420 | of magnitude more scale than you tested at.
01:08:48.340 | - Yeah, but it still has the same flavors of,
01:08:50.540 | at least echoes of the expectabilities.
01:08:55.940 | Although I suspect with GPT scaled more and more,
01:08:58.980 | you might get surprising things.
01:09:01.720 | So yeah, you're right.
01:09:03.120 | It's interesting.
01:09:04.680 | It's difficult to see how far an idea will go
01:09:07.920 | when it's scaled.
01:09:09.240 | It's an open question.
01:09:11.000 | - Well, so to that point with Dota and PPO,
01:09:13.160 | here's a very concrete one.
01:09:14.880 | One thing that's very surprising about Dota
01:09:17.680 | that I think people don't really pay that much attention to
01:09:20.320 | is the degree of generalization out of distribution
01:09:23.120 | that happens, right?
01:09:24.520 | That you have this AI that's trained against other bots
01:09:27.800 | for its entirety, the entirety of its existence.
01:09:30.300 | - Sorry to take a step back.
01:09:31.420 | Can you talk through a story of Dota,
01:09:36.420 | a story of leading up to opening AI5 and that past,
01:09:42.020 | and what was the process of self-play
01:09:43.860 | and so on of training?
01:09:44.940 | - Yeah, yeah, yeah.
01:09:46.060 | - And what is Dota?
01:09:47.540 | - Yeah, Dota is a complex video game.
01:09:49.940 | And we started trying to solve Dota
01:09:52.660 | because we felt like this was a step towards the real world
01:09:55.620 | relative to other games like chess or Go,
01:09:57.740 | right, those various three board games
01:09:59.140 | where you just kind of have this board,
01:10:00.460 | very discreet moves.
01:10:01.700 | Dota starts to be much more continuous time
01:10:04.020 | that you have this huge variety of different actions,
01:10:06.180 | that you have a 45 minute game
01:10:07.640 | with all these different units,
01:10:09.340 | and it's got a lot of messiness to it
01:10:11.820 | that really hasn't been captured by previous games.
01:10:14.460 | And famously, all of the hard-coded bots for Dota
01:10:17.300 | were terrible, right?
01:10:18.340 | It's just impossible to write anything good for it
01:10:19.900 | because it's so complex.
01:10:21.220 | And so this seemed like a really good place
01:10:23.260 | to push what's the state of the art
01:10:25.180 | in reinforcement learning.
01:10:26.780 | And so we started by focusing
01:10:28.340 | on the one versus one version of the game,
01:10:29.940 | and we're able to solve that.
01:10:32.340 | We were able to beat the world champions.
01:10:33.860 | And the learning, the skill curve
01:10:37.260 | was this crazy exponential, right?
01:10:38.940 | It was like constantly we were just scaling up,
01:10:40.980 | that we were fixing bugs,
01:10:42.240 | and that you look at the skill curve,
01:10:44.300 | and it was really a very, very smooth one.
01:10:46.500 | And this was actually really interesting
01:10:47.460 | to see how that human iteration loop
01:10:49.980 | yielded very steady exponential progress.
01:10:52.700 | - And to one side note, first of all,
01:10:55.180 | it's an exceptionally popular video game.
01:10:57.100 | The side effect is that there's a lot
01:10:59.420 | of incredible human experts at that video game.
01:11:01.940 | So the benchmark that you're trying to reach is very high.
01:11:05.220 | And the other, can you talk about the approach
01:11:07.860 | that was used initially and throughout
01:11:10.140 | training these agents to play this game?
01:11:12.060 | - Yep, and so the approach that we used is self-play.
01:11:14.380 | And so you have two agents that don't know anything.
01:11:17.340 | They battle each other.
01:11:18.680 | They discover something a little bit good,
01:11:20.780 | and now they both know it.
01:11:22.020 | And they just get better and better and better
01:11:23.380 | without bound.
01:11:24.540 | And that's a really powerful idea, right?
01:11:27.060 | That we then went from the one-versus-one version
01:11:30.180 | of the game and scaled up to five-versus-five, right?
01:11:32.420 | So you think about kind of like with basketball,
01:11:34.300 | where you have this team sport,
01:11:35.460 | and you need to do all this coordination.
01:11:37.660 | And we were able to push the same idea,
01:11:40.900 | the same self-play, to really get to the professional level
01:11:45.900 | at the full five-versus-five version of the game.
01:11:49.140 | And the things I think are really interesting here
01:11:52.620 | is that these agents, in some ways,
01:11:54.980 | they're almost like an insect-like intelligence, right?
01:11:57.020 | Where they have a lot in common
01:11:58.900 | with how an insect is trained, right?
01:12:00.340 | Insect kind of lives in this environment
01:12:02.020 | for a very long time, or the ancestors of this insect
01:12:05.140 | have been around for a long time
01:12:06.080 | and had a lot of experience.
01:12:07.220 | It gets baked into this agent.
01:12:09.900 | And it's not really smart in the sense of a human, right?
01:12:12.940 | It's not able to go and learn calculus,
01:12:14.800 | but it's able to navigate its environment extremely well.
01:12:17.140 | It's able to handle unexpected things
01:12:18.620 | in an environment that it's never seen before pretty well.
01:12:22.240 | And we see the same sort of thing with our Dota bots,
01:12:24.740 | right, that they're able to, within this game,
01:12:26.900 | they're able to play against humans,
01:12:28.620 | which is something that never existed
01:12:30.140 | in its evolutionary environment.
01:12:31.540 | Totally different play styles from humans versus the bots.
01:12:34.620 | And yet, it's able to handle it extremely well.
01:12:37.380 | And that's something that I think was very surprising to us,
01:12:40.580 | was something that doesn't really emerge
01:12:43.580 | from what we've seen with PPO at smaller scale, right?
01:12:47.380 | And the kind of scale we're running this stuff at was,
01:12:49.580 | you know, like, let's say, like 100,000 CPU cores
01:12:52.120 | running with like hundreds of GPUs.
01:12:54.260 | It was probably about, you know, like,
01:12:56.820 | something like hundreds of years of experience
01:13:00.460 | going into this bot every single real day.
01:13:04.020 | And so that scale is massive,
01:13:06.420 | and we start to see very different kinds of behaviors
01:13:08.620 | out of the algorithms that we all know and love.
01:13:10.980 | - Dota, you mentioned beat the world expert 1v1,
01:13:15.260 | and then you weren't able to win 5v5 this year
01:13:21.340 | at the best players in the world.
01:13:24.160 | So what's the comeback story?
01:13:26.240 | What's, first of all, talk through that,
01:13:27.720 | that was an exceptionally exciting event.
01:13:29.520 | And what's the following months and this year look like?
01:13:33.240 | - Yeah, yeah, so, well, one thing that's interesting
01:13:35.320 | is that, you know, we lose all the time.
01:13:37.700 | Because we play- - We lose, we, here.
01:13:40.080 | - So the Dota team at OpenAI,
01:13:41.760 | we play the bot against better players
01:13:44.200 | than our system all the time, or at least we used to, right?
01:13:47.480 | Like, you know, the first time we lost publicly
01:13:50.160 | was we went up on stage at the International,
01:13:52.300 | and we played against some of the best teams in the world,
01:13:54.700 | and we ended up losing both games,
01:13:56.400 | but we gave them a run for their money, right?
01:13:58.620 | The both games were kind of 30 minutes, 25 minutes,
01:14:01.500 | and they went back and forth, back and forth,
01:14:03.220 | back and forth.
01:14:04.300 | And so I think that really shows
01:14:05.980 | that we're at the professional level,
01:14:08.100 | and that kind of looking at those games,
01:14:09.740 | we think that the coin could have gone a different direction
01:14:12.380 | and we could have had some wins,
01:14:13.740 | and so that was actually very encouraging for us.
01:14:16.100 | And, you know, it's interesting
01:14:17.180 | 'cause the International was at a fixed time, right?
01:14:19.820 | So we knew exactly what day we were going to be playing,
01:14:22.820 | and we pushed as far as we could, as fast as we could.
01:14:25.620 | Two weeks later, we had a bot that had an 80% win rate
01:14:28.120 | versus the one that played at TI.
01:14:30.220 | So the march of progress, you know,
01:14:31.860 | you should think of as a snapshot
01:14:33.320 | rather than as an end state.
01:14:34.860 | And so in fact, we'll be announcing our finals pretty soon.
01:14:39.140 | I actually think that we'll announce our final match
01:14:41.940 | prior to this podcast being released.
01:14:45.180 | - Okay, nice. - So there should be,
01:14:47.020 | we'll be playing against the world champions.
01:14:49.860 | And, you know, for us, it's really less about,
01:14:52.660 | like, the way that we think about what's upcoming
01:14:55.420 | is the final milestone,
01:14:57.780 | the final competitive milestone for the project, right?
01:15:00.420 | That our goal in all of this
01:15:02.180 | isn't really about beating humans at Dota.
01:15:05.300 | Our goal is to push the state-of-the-art
01:15:06.940 | in reinforcement learning, and we've done that, right?
01:15:09.020 | And we've actually learned a lot from our system
01:15:10.820 | and that we have, you know,
01:15:12.500 | I think a lot of exciting next steps that we wanna take.
01:15:14.860 | And so, you know, kind of as a final showcase
01:15:16.580 | of what we built, we're going to do this match.
01:15:18.900 | But for us, it's not really the success or failure
01:15:21.380 | to see, you know, do we have the coin flip
01:15:23.780 | going in our direction or against?
01:15:25.940 | - Where do you see the field of deep learning
01:15:28.860 | heading in the next few years?
01:15:31.620 | Where do you see the work and reinforcement learning
01:15:35.620 | perhaps heading, and more specifically with OpenAI,
01:15:40.620 | all the exciting projects that you're working on,
01:15:44.460 | what does 2019 hold for you?
01:15:46.460 | - Massive scale.
01:15:47.420 | - Scale.
01:15:48.260 | - I will put an asterisk on that and just say,
01:15:49.660 | you know, I think that it's about ideas plus scale.
01:15:52.340 | You need both.
01:15:53.180 | - So that's a really good point.
01:15:55.060 | So the question, in terms of ideas,
01:15:58.620 | you have a lot of projects
01:16:00.620 | that are exploring different areas of intelligence.
01:16:04.380 | And the question is, when you think of scale,
01:16:07.660 | do you think about growing the scale
01:16:09.780 | of those individual projects,
01:16:10.940 | or do you think about adding new projects?
01:16:13.260 | And sorry, if you were thinking about adding new projects,
01:16:17.580 | or if you look at the past,
01:16:19.020 | what's the process of coming up with new projects
01:16:21.380 | and new ideas?
01:16:22.220 | - Yep.
01:16:23.060 | So we really have a life cycle of project here.
01:16:25.380 | So we start with a few people
01:16:27.020 | just working on a small scale idea,
01:16:28.580 | and language is actually a very good example of this,
01:16:30.700 | that it was really, you know, one person here
01:16:32.620 | who was pushing on language for a long time.
01:16:34.980 | I mean, then you get signs of life, right?
01:16:36.820 | And so this is like, let's say, you know,
01:16:38.580 | with the original GPT,
01:16:40.900 | we had something that was interesting.
01:16:42.740 | And we said, okay, it's time to scale this, right?
01:16:44.940 | It's time to put more people on it,
01:16:46.140 | put more computational resources behind it.
01:16:48.180 | And then we just kind of keep pushing and keep pushing.
01:16:51.700 | And the end state is something
01:16:52.740 | that looks like Dota or robotics,
01:16:54.420 | where you have a large team of, you know, 10 or 15 people
01:16:57.260 | that are running things at very large scale,
01:16:59.340 | and that you're able to really have material engineering
01:17:02.340 | and, you know, sort of machine learning science
01:17:05.820 | coming together to make systems that work
01:17:08.980 | and get material results
01:17:10.420 | that just would have been impossible otherwise.
01:17:12.420 | So we do that whole life cycle.
01:17:13.780 | We've done it a number of times, you know,
01:17:15.860 | typically end to end.
01:17:16.820 | It's probably two years or so to do it.
01:17:20.220 | I know the organization's been around for three years,
01:17:21.900 | so maybe we'll find that we also have
01:17:23.180 | longer life cycle projects.
01:17:24.940 | But, you know, we'll work up to those.
01:17:27.840 | We have, so one team that we were actually just starting,
01:17:31.620 | Ilya and I are kicking off a new team
01:17:33.460 | called the reasoning team.
01:17:34.660 | And this is to really try to tackle,
01:17:36.460 | how do you get neural networks to reason?
01:17:38.740 | And we think that this will be a long term project.
01:17:42.780 | It's one that we're very excited about.
01:17:44.780 | - In terms of reasoning, super exciting topic.
01:17:47.600 | What kind of benchmarks,
01:17:51.260 | what kind of tests of reasoning do you envision?
01:17:55.380 | What would, if you sat back with whatever drink
01:17:59.060 | and you would be impressed
01:18:00.620 | that this system is able to do something,
01:18:02.900 | what would that look like?
01:18:03.980 | - Theory improving.
01:18:04.940 | - Theory improving.
01:18:06.580 | So some kind of logic and especially mathematical logic.
01:18:10.580 | - I think so, right?
01:18:11.420 | And I think that there's kind of other problems
01:18:13.620 | that are dual to theory improving in particular.
01:18:15.780 | You know, you think about programming,
01:18:18.020 | you think about even like security analysis of code,
01:18:21.300 | that these all kind of capture
01:18:23.020 | the same sorts of core reasoning
01:18:25.380 | and being able to do some out of distribution
01:18:27.460 | generalization.
01:18:28.380 | - It would be quite exciting if OpenAI reasoning team
01:18:32.660 | was able to prove that P equals NP.
01:18:34.780 | That would be very nice.
01:18:36.100 | It would be very, very exciting,
01:18:37.620 | especially if it turns out that P equals NP,
01:18:39.820 | that'll be interesting too.
01:18:41.220 | (both laughing)
01:18:43.180 | - It would be ironic and humorous.
01:18:46.100 | So what problem stands out to you
01:18:49.900 | as the most exciting and challenging,
01:18:53.460 | impactful to the work for us as a community in general
01:18:56.420 | and for OpenAI this year?
01:18:58.540 | You mentioned reasoning.
01:18:59.580 | I think that's a heck of a problem.
01:19:01.420 | - Yeah, so I think reasoning is an important one.
01:19:02.900 | I think it's gonna be hard to get good results in 2019.
01:19:05.620 | You know, again, just like we think about the life cycle,
01:19:07.580 | it takes time.
01:19:08.780 | I think for 2019, language modeling
01:19:10.460 | seems to be kind of on that ramp, right?
01:19:12.620 | It's at the point that we have a technique that works.
01:19:14.940 | We wanna scale 100X, 1,000X, see what happens.
01:19:18.100 | - Awesome.
01:19:19.060 | Do you think we're living in a simulation?
01:19:21.580 | - I think it's hard to have a real opinion about it.
01:19:24.820 | It's actually interesting.
01:19:26.580 | I separate out things that I think can have like,
01:19:29.180 | you know, yield materially different predictions
01:19:31.420 | about the world from ones that are just kind of,
01:19:34.220 | you know, fun to speculate about.
01:19:35.860 | And I kind of view simulation as more like,
01:19:37.980 | is there a flying teapot between Mars and Jupiter?
01:19:40.340 | Like, maybe, but it's a little bit hard
01:19:43.500 | to know what that would mean for my life.
01:19:45.100 | - So there is something actionable.
01:19:47.020 | So some of the best work OpenAI has done
01:19:50.760 | is in the field of reinforcement learning.
01:19:52.780 | And some of the success of reinforcement learning
01:19:56.620 | come from being able to simulate the problem
01:19:58.900 | you're trying to solve.
01:20:00.140 | So do you have a hope for reinforcement,
01:20:03.660 | for the future of reinforcement learning,
01:20:05.300 | and for the future of simulation?
01:20:07.100 | Like whether we're talking about autonomous vehicles
01:20:09.100 | or any kind of system, do you see that scaling
01:20:12.940 | to where we'll be able to simulate systems
01:20:15.020 | and hence be able to create a simulator
01:20:17.860 | that echoes our real world and proving once and for all,
01:20:21.620 | even though you're denying it
01:20:22.660 | that we're living in a simulation?
01:20:25.100 | - I feel like there's two separate questions, right?
01:20:26.500 | So, you know, kind of at the core there of like,
01:20:28.420 | can we use simulation for self-driving cars?
01:20:31.240 | Take a look at our robotic system, Dactyl, right?
01:20:33.860 | That was trained in simulation using the Dota system,
01:20:37.020 | in fact, and it transfers to a physical robot.
01:20:40.460 | And I think everyone looks at our Dota system,
01:20:42.320 | they're like, okay, it's just a game.
01:20:43.560 | How are you ever gonna escape to the real world?
01:20:45.260 | And the answer is, well, we did it with the physical robot
01:20:47.480 | that no one could program.
01:20:48.700 | And so I think the answer is simulation
01:20:50.260 | goes a lot further than you think
01:20:52.100 | if you apply the right techniques to it.
01:20:54.220 | Now, there's a question of, you know,
01:20:55.480 | are the beings in that simulation gonna wake up
01:20:57.520 | and have consciousness?
01:20:59.620 | I think that one seems a lot harder to, again, reason about.
01:21:03.020 | I think that, you know, you really should think about like,
01:21:05.380 | where exactly does human consciousness come from
01:21:07.940 | in our own self-awareness?
01:21:09.160 | And, you know, is it just that like,
01:21:10.740 | once you have like a complicated enough neural net,
01:21:12.380 | do you have to worry about the agents feeling pain?
01:21:14.980 | And, you know, I think there's like
01:21:17.660 | interesting speculation to do there,
01:21:19.460 | but, you know, again, I think it's a little bit hard
01:21:22.060 | to know for sure.
01:21:23.100 | - Well, let me just keep with the speculation.
01:21:25.020 | Do you think to create intelligence, general intelligence,
01:21:28.620 | you need one, consciousness, and two, a body?
01:21:33.180 | Do you think any of those elements are needed,
01:21:35.040 | or is intelligence something that's orthogonal to those?
01:21:38.500 | - I'll stick to the kind of like
01:21:40.220 | the non-grand answer first, right?
01:21:41.900 | So the non-grand answer is just to look at, you know,
01:21:44.380 | what are we already making work?
01:21:45.780 | You look at GPT-2, a lot of people would have said
01:21:47.820 | that to even get these kinds of results,
01:21:49.460 | you need real-world experience.
01:21:51.060 | You need a body, you need grounding.
01:21:52.580 | How are you supposed to reason about any of these things?
01:21:55.060 | How are you supposed to like even kind of know
01:21:56.500 | about smoke and fire and those things
01:21:58.040 | if you've never experienced them?
01:21:59.740 | And GPT-2 shows that you can actually go way further
01:22:02.980 | than that kind of reasoning would predict.
01:22:05.940 | So I think that in terms of do we need consciousness,
01:22:10.580 | do we need a body?
01:22:11.820 | It seems the answer is probably not, right?
01:22:13.380 | That we could probably just continue to push
01:22:15.100 | kind of the systems we have.
01:22:16.140 | They already feel general.
01:22:18.260 | They're not as competent or as general
01:22:20.540 | or able to learn as quickly as an AGI would,
01:22:23.020 | but, you know, they're at least like kind of proto-AGI
01:22:25.980 | in some way, and they don't need any of those things.
01:22:29.780 | Now let's move to the grand answer,
01:22:31.940 | which is, you know, if our neural net's conscious already,
01:22:36.500 | would we ever know?
01:22:37.420 | How can we tell, right?
01:22:38.900 | And, you know, here's where the speculation
01:22:40.900 | starts to become, you know, at least interesting or fun
01:22:44.900 | and maybe a little bit disturbing,
01:22:46.420 | depending on where you take it.
01:22:48.060 | But it certainly seems that when we think about animals,
01:22:51.260 | that there's some continuum of consciousness.
01:22:53.300 | You know, my cat, I think, is conscious in some way, right?
01:22:56.900 | You know, not as conscious as a human.
01:22:58.220 | And you could imagine that you could build
01:23:00.100 | a little consciousness meter, right?
01:23:01.220 | You point at a cat, it gives you a little reading.
01:23:03.060 | Point at a human, it gives you much bigger reading.
01:23:05.500 | What would happen if you pointed one of those
01:23:08.100 | at a DOTA neural net?
01:23:09.940 | And if you're training in this massive simulation,
01:23:12.180 | do the neural nets feel pain?
01:23:13.620 | You know, it becomes pretty hard to know
01:23:16.940 | that the answer is no, and it becomes pretty hard
01:23:20.180 | to really think about what that would mean
01:23:22.460 | if the answer were yes.
01:23:24.300 | And it's very possible, you know, for example,
01:23:27.620 | you could imagine that maybe the reason
01:23:29.580 | that humans have consciousness is because
01:23:32.300 | it's a convenient computational shortcut, right?
01:23:35.140 | If you think about it, if you have a being
01:23:37.140 | that wants to avoid pain, which seems pretty important
01:23:39.540 | to survive in this environment,
01:23:40.940 | and wants to, like, you know, eat food,
01:23:43.780 | then maybe the best way of doing it
01:23:45.620 | is to have a being that's conscious, right?
01:23:47.220 | That, you know, in order to succeed in the environment,
01:23:49.620 | you need to have those properties,
01:23:51.220 | and how are you supposed to implement them?
01:23:52.740 | And maybe this consciousness is a way of doing that.
01:23:55.420 | If that's true, then actually maybe we should expect
01:23:57.900 | that really competent reinforcement learning agents
01:24:00.020 | will also have consciousness.
01:24:02.080 | But, you know, it's a big if,
01:24:03.300 | and I think there are a lot of other arguments
01:24:04.860 | that you can make in other directions.
01:24:06.700 | - I think that's a really interesting idea
01:24:08.500 | that even GPT-2 has some degree of consciousness.
01:24:11.500 | That's something that's actually not as crazy
01:24:14.300 | to think about, it's useful to think about,
01:24:16.580 | as we think about what it means to create intelligence
01:24:19.220 | of a dog, intelligence of a cat,
01:24:21.160 | and the intelligence of a human.
01:24:24.500 | So, last question, do you think we will ever fall in love,
01:24:29.500 | like in the movie "Her,"
01:24:32.060 | with an artificial intelligence system,
01:24:34.460 | or an artificial intelligence system
01:24:36.300 | falling in love with a human?
01:24:38.660 | - I hope so.
01:24:40.260 | - If there's any better way to end it, is on love.
01:24:43.740 | So, Greg, thanks so much for talking today.
01:24:45.660 | - Thank you for having me.
01:24:46.940 | (upbeat music)
01:24:49.520 | (upbeat music)
01:24:52.100 | (upbeat music)
01:24:54.680 | (upbeat music)
01:24:57.260 | (upbeat music)
01:24:59.840 | (upbeat music)
01:25:02.420 | [BLANK_AUDIO]