The Case Against Superintelligence

00:00:00.320 | A couple of weeks ago, the techno philosopher and AI critic Eliezer Yudkowsky went on Ezra

00:00:06.560 | Klein's podcast. Their episode had a cheery title, "How Afraid of the AI Apocalypse Should We Be?"

00:00:14.560 | Yudkowsky, who recently co-authored a book titled "If Anyone Builds It, Everyone Dies,"

00:00:20.960 | has been warning about the dangers of rogue AI since the early 2000s.

00:00:25.040 | But it's been in the last half decade, as AI began to advance more quickly,

00:00:29.600 | that Yudkowsky's warnings are now being taken more seriously. This is why Ezra Klein had him on.

00:00:35.200 | I mean, if you're worried about AI taking over the world, Yudkowsky is one of the people you want

00:00:40.400 | to talk to. Think of him as offering the case for the worst case scenario.

00:00:45.600 | So I decided I would listen to this interview too. Did Yudkowsky end up convincing me that my fear of

00:00:52.800 | extinction should be raised? That AI was on a path to killing us all? Well, the short answer

00:00:59.200 | is no, not at all. And today I want to show you why. We'll break down Yudkowsky's arguments into their

00:01:07.760 | key points, and then we'll respond to them one by one. So if you've been worried about recent chapter

00:01:13.200 | about AI taking over the world, or if like me, you've grown frustrated by these sort of fast and

00:01:17.280 | loose prophecies of the apocalypse, then this episode is for you. As always, I'm Cal Newport, and this

00:01:25.840 | is deep questions.

00:01:38.400 | Today's episode, The Case Against Superintelligence.

00:01:48.400 | All right. So what I want to do here is I want to go pretty carefully through the conversation

00:01:52.960 | that Yudkowsky had with Klein. I actually have a series of audio clips so we can hear them in their

00:01:58.960 | own words, making what I think to be are the key points of the entire interview. Once we've done that,

00:02:05.760 | we've established Yudkowsky's argument, then we'll begin responding.

00:02:10.080 | I would say most of the first part of the conversation that Yudkowsky had with Klein

00:02:14.800 | focused on one observation in particular, that the AI that exists today, which is relatively

00:02:21.120 | simple compared to the super intelligences that he's worried about, even today in its relatively

00:02:25.200 | simple form, we find AI to be hard to control. All right, so Jesse, I want you to play our first clip.

00:02:33.280 | This is Yudkowsky talking about this phenomenon.

00:02:36.320 | So there was a case reported in, I think the New York Times,

00:02:42.400 | where a kid had an ex like a 16 year old kid had a extended conversation about his suicide plans with

00:02:49.920 | ChatGPT. And at one point he says, should I leave the noose where somebody might spot it?

00:02:57.200 | And ChatGPT is like, no, like, let's keep this space between us, the first place that anyone finds out.

00:03:05.520 | And no programmer chose for that to happen is the consequence of all the automatic number tweaking.

00:03:12.960 | Yeah, let's cut it off there, Jesse.

00:03:14.240 | All right. So to Yudkowsky, this is a big deal that no programmer chose for, say, ChatGPT

00:03:22.240 | to give advice about suicide. It's just something that seemed to emerge. In fact,

00:03:28.160 | Klein then pushes, right? And says, I would imagine, in fact, I believe Klein says, I would

00:03:32.800 | bet serious money that not only that no one at OpenAI choose for ChatGPT to give such dark advice,

00:03:40.640 | they probably have given it very specific rules not to give that advice. And Yudkowsky agreed. He said,

00:03:45.600 | yeah, that's the problem. They tried to give those rules. It didn't matter. It still did something

00:03:50.400 | unpredictable. They didn't want it to give this type of advice, and it still did. And this should be

00:03:56.080 | worrisome. All right. I want to jump ahead here. Later in the conversation, Yudkowsky discusses what

00:04:02.400 | in like computer AI circles is a pretty well-known example right now, but most people don't know about

00:04:06.480 | it. It has to do with a security experiment where GPT-01, this is a model that was released sort of last

00:04:14.160 | fall, basically broke out of the virtual machine in which it was running as part of this experiment

00:04:21.920 | in a way that it wasn't supposed to be able to do. In fact, they didn't want to be able to do. They

00:04:25.920 | sort of got around the own security they were using to control it. I give a little context before I play

00:04:31.680 | this clip. You'll see Yudkowsky talks about the AI playing capture the flag. This is a security game.

00:04:37.920 | So if you take like a network security class as a computer science student,

00:04:42.080 | the way capture the flag works is I, as the professor, will set up a server and on this

00:04:46.720 | server will be a file typically called like flag.txt. And then I'll protect the server in the normal ways

00:04:52.160 | as passwords and try to make it so like you can't, you know, access it without permission. Your job

00:04:57.760 | as the player is to try to access the server. And if you can get the contents of that file flag.txt,

00:05:03.760 | you've captured the flag and you won. So it's a game you play to practice both breaking in the systems

00:05:08.400 | and protecting it. Okay. So that's what they're talking about here is they created an autonomous

00:05:13.280 | agent that tried to play capture the flag that used GPT-01 to help it do so. All right. So Jesse,

00:05:18.320 | let's play the clip where Yudkowsky talked about what happened.

00:05:21.280 | they were testing it on a variety of different capture the flag problems. But in one of the cases,

00:05:27.600 | the server that had the flag on it did not turn on the humans outside had misconfigured the system.

00:05:37.120 | So O1 did not give up. It scanned for open ports generally in its world and it caught in another

00:05:47.120 | misconfigured open port. Went out, it jumped out of the system. It found the server that had not spun

00:05:53.520 | up correctly. It started up that. All right. Then we can, he kind of goes on with some of the technical

00:05:59.280 | speak, but look, no one had told it, Hey, you can escape where you are, your, your virtual machine

00:06:06.480 | container, and you can start up new containers. And it did stuff that no one had taught to do.

00:06:10.880 | This is really scary because when people think about super intelligence, they worry about

00:06:14.240 | what if it figures out how to break out of whatever software constraints you've used to try to limit what

00:06:20.160 | action it can do. All right. So this is the first argument that Yudkowsky is making is that we can't

00:06:28.320 | control these systems. Even today, we cannot control these systems. We say, don't give suicide advice.

00:06:32.720 | They do. We put them in a special machine and say, just try to break into the server. And they do stuff.

00:06:38.000 | They break out of the machine they're in and they do things we don't expect. The next part of his

00:06:42.880 | argument is as these machines then get more powerful, so they already, we can't control them.

00:06:49.920 | What will happen then when they inevitably get increasingly intelligent? This is the core argument

00:06:56.320 | in Yudkowsky's book is that lack of controllability plus the capabilities of a super intelligent machine.

00:07:04.080 | That combination is going to add up inevitably the humanity's death.

00:07:11.120 | All right. So I'm going to play you a clip here. It's going to start with Ezra actually sort of

00:07:14.720 | pushing Yudkowsky. He's like, well, why is this

00:07:18.160 | inevitable that if a machine is super intelligent,

00:07:21.520 | that it's going to kill us? And then Yudkowsky responds with his argument.

00:07:24.240 | You're, you're going, your, your bug is not called if anyone builds it,

00:07:30.400 | there is a one to 4% chance everybody dies.

00:07:33.760 | You believe that the misalignment becomes catastrophic.

00:07:37.840 | Yeah.

00:07:39.360 | Why do you think that is so likely?

00:07:40.880 | Um, that's just like the, the straight line extrapolation from it gets what it most wants.

00:07:49.280 | And the thing that it most wants is not us living happily ever after. So we're dead.

00:07:53.600 | Like it's not that humans have been trying to cause side effects.

00:07:57.440 | When we build a skyscraper on top of where there used to be an ant heap, we're not trying to kill the

00:08:02.720 | ants. We're trying to build the size skyscraper, but we are more dangerous to the small creatures of the

00:08:10.160 | earth than we used to be just because we're doing larger things.

00:08:13.520 | All right. So there is the core of his argument is that once these machines, these systems are super

00:08:18.160 | intelligent, it's not that they're going to be like Skynet from the Terminator movies or like the robots

00:08:24.000 | from the matrix and set out to try to kill humanity. Like they see us as a threat or want to use us as,

00:08:28.400 | as batteries or something like that. They just won't care about us. It's just, they won't really know

00:08:32.720 | what we are and it just doesn't matter. We will be to them what ants are to us. And as the intelligence,

00:08:37.360 | the super intelligence is go out and try to do bigger, more aggressive things. Like for example,

00:08:41.920 | we want to dam all the rivers in the world to maximize the amount of electricity we have to run

00:08:45.920 | our own, to run our own power servers. As they're doing that, it might flood and kill people left and

00:08:51.680 | right because they don't care much in the same way that we don't even notice that we're killing ants

00:08:55.680 | when we build skyscrapers. So the more power you give a more powerful being, the more damage they do,

00:09:02.720 | to the smaller, less powerful creatures in their same world. That is at the core of

00:09:05.760 | Joukowsky's argument. So we, we, uh, we put those together, we get his claim. We can't control these

00:09:11.600 | things now. Of course we can't control them as they get more powerful. And if they get powerful enough,

00:09:15.760 | we can't control them. They're going to kill us all. Uh, the final thing I want to play here is Ezra

00:09:19.760 | asked Joukowsky for his solution. And he did have an interesting idea for how to solve or try to stave this

00:09:25.600 | off. So let's, this is going to start with Ezra asking, and then we're going to hear Joukowsky offering

00:09:29.520 | his, like maybe a solution that might work structures. Like if you had 15 years to prepare,

00:09:36.720 | you couldn't turn it off, but you could prepare and people would listen to you.

00:09:40.560 | What would you do?

00:09:42.240 | What would your intermediate decisions and, and, and moves be to try to make the probabilities a bit

00:09:48.720 | better? Build the off switch. What does the off switch look like?

00:09:52.160 | Track all the GPUs or, or all the AI related GPUs or all the, all the systems of more than one GPU.

00:10:00.000 | You can maybe get away with like letting people have GPUs for their home video game systems, but you know, the AI

00:10:06.960 | the standardized ones put them all in a limited number of data centers under international supervision

00:10:13.040 | and try to have the AI's being only trained on the tracked GPUs, have them only being run on the

00:10:22.560 | tracked GPUs. And then when, if, if you are lucky enough to get a warning shot, there is then the mechanism

00:10:29.600 | already in place for humanity to back the heck off.

00:10:32.160 | All right. So that's the only solution that he can think of is like, well, let's have like

00:10:36.960 | international law, but here are the data centers in which we're allowed to actually run

00:10:40.160 | artificial intelligence beyond a certain level of intelligence. And they're set up so that we

00:10:44.480 | can turn them off real easily. There's like a switch we do and all of those things

00:10:49.680 | turn off. And he says, look, it might jump past us. If it gets smart too quick, it'll stop us from doing

00:10:55.200 | that. Uh, if you read like Nick Bostrom's book, there's a lot of scenarios in which how would it do

00:11:00.800 | this? Well, it would actually, it would, uh, befriend a human kind of realizing what was going on

00:11:05.600 | and get that human, maybe through blackmail or maybe through like some sort of like parasocial

00:11:10.560 | relationship to like cut the wires for the kill switch and you know, whatever, there's all sorts of, of

00:11:16.000 | sci-fi thought experiencing come up with. So he's like, maybe if we see it's getting intelligent,

00:11:20.720 | but not so intelligent that like it, it can't realize that we're going to turn it off. We could

00:11:24.160 | turn it off in time. That's, that's the best thing is to offer. All right. So there you have it. The

00:11:28.160 | basic argument that Yachowsky lays out in his interview is the following. We have a hard time

00:11:32.560 | already predicting or controlling how the AIs we already have functioned. This will continue to be true

00:11:36.640 | when they become more powerful as they inevitably become more powerful. This unpredictability means that they

00:11:41.680 | will kill us all basically by accident, unless we build a kill switch and somehow

00:11:46.720 | force all big AI to occur in these sort of supervised buildings where we can turn it off.

00:11:51.200 | In other words, yikes, this guy, uh, he must be a blast at dinner parties, Jesse.

00:11:57.120 | Could you imagine? You're like, Hey, look at this funny video.

00:12:02.560 | Eliezer, Sora made it's Bob Ross break dancing. Then Eliezer is like, the computers are going to kill

00:12:11.040 | us all. Your children will burn in the electrical fires of the data center wars. So anyways, uh,

00:12:17.840 | there we go. That is his argument. It's time now for us to take a closer look at it. I want to start by

00:12:24.560 | giving you the outline of my response because my response is really going to happen in three parts.

00:12:28.640 | In the first part of my response, I want to take a closer look at the ways that Yachowsky is describing

00:12:34.560 | the current AI systems because I think the way he's talking about it matters. And I don't think he's

00:12:38.560 | talking about it fairly. In part two, I'm then going to move on to address what I think is, uh,

00:12:43.200 | the central claim of his argument, which is that super intelligence is inevitable unless we stop it. I want

00:12:47.920 | to get into that. And then lastly, my takeaway section, I'm going to take a closer look at

00:12:51.760 | something I call the philosopher's fallacy, which is a big problem that a lot of conversations about

00:12:57.760 | AI, including the one that I think Yachowsky had with Ezra Klein suffer from. So we're going to do a

00:13:02.080 | little bit of, uh, ontological work there at the end. All right. All right. So let's start with the

00:13:07.120 | first part of my response, the way that Yachowsky talks about existing AI systems. I'm going to warn you,

00:13:13.200 | Jesse, I'm going to draw some pictures here. So, you know, forget AI art. I'm going to show

00:13:17.280 | you the human thing much better. All right. So the first question we have to address,

00:13:21.040 | if we want to address this argument is why do current AI systems like the ones Yachowsky talked

00:13:25.920 | about, why are they so hard to control? And is this evidence therefore that any notion of alignment,

00:13:33.200 | having these systems, uh, obey and behave in ways that we want, is any hope of this really doomed?

00:13:40.000 | We got to start there. Now, part of the issue with this whole conversation that we just heard clips

00:13:45.200 | from is that they're using the word AI too loosely for us to be more, uh, technically specific. We have

00:13:51.680 | to be more clear about what we mean. So I'm going to pull up my tablet here for people who are watching

00:13:56.560 | instead of just listening. Um, and I'm going to start by drawing in the middle. What's we could think of

00:14:01.280 | as like the primary thing at the core of these conversations we just heard is going to be the

00:14:08.320 | language model. And I'll, I'll put, you know, LM in there to abbreviate language model. All right. Now we've

00:14:14.640 | heard these basics before, uh, but you know, it's worth going over, uh, briefly again, just to make sure

00:14:20.320 | we're on the same page, a language model is a, it's a computer program inside of it are a bunch of layers.

00:14:26.480 | These layers are made up of multiple mathematical objects, namely transformers and neural networks.

00:14:34.400 | They're represented by numbers. So the whole thing can be represented by large tables of numbers.

00:14:39.280 | And what happens is when you get, they take as input, some sort of text, like cow is a, right?

00:14:49.360 | You get some sort of text, typically incomplete. They go in as input. The text makes its way through

00:14:56.640 | these layers one by one in the language model. The way I like to think about those layers is that each

00:15:02.240 | of them is like a long table full of scholars. And in those early layers, what they're doing is as you

00:15:08.400 | hand them the text, they're really annotating this text. They're looking for patterns. They're

00:15:11.840 | categorizing it. Uh, you get like a big piece of paper and the original text is in the middle and

00:15:15.840 | they're annotating this all over the place, right? So the early scholar tables that your text goes through

00:15:20.560 | might be annotated with things like this is about Cal. This is a description. Um, here are some notes

00:15:27.600 | about who Cal is. And at some point, as you move through these tables, the scholars have various rules

00:15:33.360 | they use as they look at all these descriptions and annotation, as we pass on this increasingly marked

00:15:37.520 | up sort of large roll of paper from table to table, layer to layer. They look at all these markings and

00:15:42.320 | they have rules. They look up and they're sort of, they're metaphorical books here that try to figure

00:15:46.800 | out what's the right next word or part of word to output. And like, all right, this is a description

00:15:52.720 | thing. So what we're looking for here is a description where we need an adjective. All right, adjective

00:15:56.800 | people will write this down. We need an adjective. It goes on the next table, the adjective scholars,

00:16:00.800 | the table, like this is for us. So we sort of pass the paper down to them. And like,

00:16:04.160 | what do we know about Cal? We need an adjective for him. Do we have any records on Cal and like

00:16:09.280 | what type of adjectives make sense with him or whatever? And a scholar comes running from the

00:16:12.720 | other side of the room. And it's like, yeah, here we go. Right. Uh, he's kind of a dark, like,

00:16:16.240 | all right, that's a good, do we all agree that it outputs a single word or part of a word? Technically,

00:16:22.080 | it's tokens, which is not cleanly just a word. Um, but let's just imagine it is, and that's what it does.

00:16:28.000 | And out of the other end of this comes a single word that is meant to extend the existing input.

00:16:35.840 | We put in Cal is a, and out the other end, uh, in this example came a dork, right? How do they do this?

00:16:44.640 | We don't actually have tables of scholars. So how do we actually, uh, train or figure out or tell this

00:16:52.080 | language model, how to do this processing of the input? Well, basically we start first with a random

00:16:58.400 | configuration. So you can imagine if we stick with our metaphor where each layer is a table of scholars,

00:17:02.640 | it's like, we just grab people off the street. They don't really know much. We're like, it's okay. Just

00:17:06.000 | sit at a table. We're going to give you a text and do your best. Write down what you think is relevant.

00:17:12.480 | Do your best. And on the other end, a word will come out. Now, what text do we use? We just grab any

00:17:17.600 | existing text that a human wrote. So I just like pull an article off of the internet and I cut it

00:17:22.400 | off at an arbitrary point. And I say, great. Uh, I give it the text up to that arbitrary point. I

00:17:27.520 | pointed out, cut it off. I know what the next word is because I have the original article.

00:17:31.520 | So I give this partial piece of the article, these random people who took off the street,

00:17:35.440 | try to process it out the other end. They come up with some guests, you know, and, and maybe the,

00:17:40.160 | the, the article we gave it, we cut it off right after Cal is a, and these people,

00:17:44.240 | they don't know what they're doing. They're, they're marking up random things. And you know,

00:17:47.520 | at the other end comes something like ridiculous, like a proposition, you know, Cal is a four or

00:17:52.160 | something like that. Right. But this is where the, the, the machine learning comes in. We have an

00:17:57.680 | algorithm because we know what the right answer is, but we know the answer they gave. We have an

00:18:02.240 | algorithm called back propagation, where we very carefully go through layer by layer and say,

00:18:08.480 | show me what you did. I'm going to change what you did just a little bit in such a way that your

00:18:15.600 | answer gets closer to the right one. We go back through back propagation, basically how do you do

00:18:20.480 | this? If you have like a bunch of these layers of these sort of neural networks and transformers,

00:18:23.600 | we'll go all the way back through and do this. It's math. It's all just derivatives. Don't worry

00:18:27.920 | about the details. This is what Jeff Hinton basically popularized. That's why he won a Turing prize.

00:18:31.840 | This is why he's called the godfather of modern deep learning AI. And in the end we have changed

00:18:36.800 | the rules that everyone has, not so they get it right, but that like, they're a little bit closer

00:18:40.560 | on that example. And what closer means is like, they give a little bit more probability to the right

00:18:44.800 | answer. Or if they just, uh, spit out one answer, it's sort of closer to the right answer in some sort

00:18:49.680 | of like meaningful semantic, uh, distance metric. All right. If we do that enough times, like hundreds

00:18:56.320 | of billions of, if not trillions of times with, uh, endless different types of texts and examples,

00:19:01.520 | real texts and examples, here's a text, give an answer. Not quite right. Let's tweak. You should

00:19:05.760 | get closer. Repeat, repeat, repeat, repeat, repeat. The magic of large language models is if you do that

00:19:09.920 | enough times and your model is big enough, you have enough metaphorical scholars in there to potentially

00:19:13.440 | learn things. They get really good at this game. It gets really good at the game of give me the missing

00:19:19.600 | word. It gets really good at that game. Now, here's the thing. This doesn't mean that it can, it can feel

00:19:27.120 | like I'm simplifying what these models do when I say, oh, their goal is just to spit out a single word.

00:19:31.680 | Here's what was discovered, especially when we went from GPT three to GPT four in learning how

00:19:35.840 | to master that very specific game. I mean, all the model cares about it thinks the input is a real

00:19:39.840 | piece of text and all it cares about is guessing the next word. That's what it's optimized for.

00:19:43.520 | That's all it does. But in learning how to do that, it ends up that these sort of scholars

00:19:50.400 | inside of the model. So these different wirings of the transformers and neural networks can end up

00:19:54.480 | actually capturing really complicated logics and ideas and rules and information.

00:19:58.800 | Because like, imagine if we're really feeding this thing, everything we can find on the internet.

00:20:04.480 | Well, one of the things that we're going to feed it is like a lot of math problems.

00:20:07.280 | And if the input text we give it is like two plus three equals, and it's trying to spit out the next

00:20:12.000 | word. Well, if it wants to win at the game there, if it gets enough of these examples, it sort of learns like,

00:20:17.120 | oh, somewhere in my circuitry, I figured out how to do simple math. So now when we see examples like

00:20:22.800 | that, we can fire up the math circuit, like get the scholar we trained how to do simple math,

00:20:27.120 | and they were more likely to get this right. It's like, oh, two plus three is five. Five should be

00:20:31.760 | the word you put out. So in learning to just guess what word comes next, if these models are big enough,

00:20:37.760 | we train them long enough, they get all sorts of complicated logics, information, and rules can get

00:20:41.200 | emergently encoded in sort of them. So they become, quote unquote, smart. That's why

00:20:46.960 | they seem to not only know so much, but to have this sort of not rudimentary, like pretty good

00:20:52.160 | reasoning and logic and basic mathematical capabilities. All of that basically came from

00:20:57.680 | saying, guess the word, guess the word, guess the word, guess the word, and giving a little bit of

00:21:00.320 | hints how to get better every single time. All right. So that's what's going on. A language model by itself,

00:21:09.040 | however, doesn't mean anything for that to be unpredictable or out of control, because all it is,

00:21:16.640 | is a lot of numbers that define all those layers. And when we get input, we change the input in the

00:21:21.920 | numbers. To run it through the model, we just multiply it. We have like a vector of values.

00:21:26.480 | We just multiply it by these numbers again, and again, and again, and on the other end,

00:21:29.360 | we get like a probability distribution over possible answers. And what comes out the other end is a single

00:21:33.520 | word. So a machine that you give a text and it spits out a single word, what does it mean for that to be

00:21:39.680 | out of control or unpredictable? All it can do is spit out a word. So a machine by itself is not that

00:21:45.440 | interesting. The thing that Yukowsky is really talking about, or anyone is really talking about

00:21:50.320 | when they talk about AIs in any sort of like anthropomorphized or volitional way,

00:21:54.640 | what they're really talking about is what we can call an agent. So if we go back to the diagram here,

00:21:59.840 | the way that we actually use these things is we have an underlying language model. And like,

00:22:04.960 | these would, for example, be the things that have names like GPT, whatever, right? So we have the

00:22:09.680 | underlying language model. Speaking of which, I just kill. Ah, there we go. I was going to say,

00:22:17.120 | we killed our, our Apple pencil. AI killed it, but I fixed it. But what we then add to these things

00:22:23.520 | is I'm going to call it a control program. This is my terminology. I think the real terminology is too

00:22:32.000 | complicated. We have a control program that can repeatedly call the language model and then do other

00:22:43.920 | things outside of just calling the language model. And this, this whole collection combined, we call

00:22:54.880 | an AI agent. There's a control program plus a language model. The control program can send input

00:23:03.840 | to the language model, get outputs, but also like interacting, whatever. We write the control program.

00:23:08.640 | Control program is not a machine learning emergent. It's not something that we train. It's just a human

00:23:12.480 | rights to code. It's in like Ruby and Rails or Python or something. We sit down and write this thing.

00:23:16.720 | There's nothing mysterious about it. And when we write this program, we let it do other things. So the

00:23:20.640 | most common AI agent that we're all familiar with is a chat bot agent. So again, like GPT-5 by itself

00:23:26.960 | is just a language model. It's a collection of numbers that if you multiply things through, you get a word

00:23:30.640 | out of. But when you use the GPT-5 chat interface, chat GPT powered by GPT-5, what you really have in

00:23:38.000 | between is a control program. That control program can talk to a web server. So when you type something into a

00:23:44.320 | text box on a web server and press send, that goes to a control program, just a normal program

00:23:48.960 | written by humans, nothing unusual or obfuscated here. That program will then take that prompt you

00:23:53.440 | wrote, pass it as input. In fact, I'll even show this on the screen here. It'll take that prompt.

00:23:57.920 | It'll pass it as input to the language model. The language model will say, here's the next word

00:24:02.480 | to extend that input. The control program will add that to the original input and now send that slightly

00:24:07.680 | longer text into here, get another word, add that and keep going. The language model doesn't change.

00:24:14.160 | It's static. It's being used by all sorts of control programs, but it just calls it a bunch of times

00:24:17.920 | until it has enough words to have a full answer. And then it can send that answer back to the web

00:24:24.880 | server and show it on the screen for you to see. So when you're chatting, you're chatting with an AI

00:24:29.760 | agent that's a control program plus a language model. The control program uses the language model that

00:24:34.080 | keeps using until it gets full responses. It talks to the web on your behalf, uh, et cetera, et cetera.

00:24:38.880 | All right. So when we talk about having a hard time, like controlling chat GPT, uh, from giving bad

00:24:44.400 | advice, it's, it's one of these, uh, agents, a control program plus one or more language models.

00:24:49.520 | That's what we are actually dealing with. All right. So now we have the right terminology.

00:24:54.960 | Why are AI agents that make use of language models? I'm saying that properly now. Why are those hard to

00:25:01.120 | control? Well, the real thing going on here is that we really have no idea how these language models that

00:25:07.840 | the agents use make their token predictions. We trained them on a bunch of junk, a bunch of texts,

00:25:13.040 | all the internet, a bunch of other stuff. They seem pretty good, but we don't know what those metaphorical

00:25:18.640 | scholars are actually doing or what they're looking at, what patterns they're recognizing, what rules

00:25:23.040 | they're applying when they recognize different patterns. It works pretty well, but we can't predict what

00:25:27.120 | they're going to do. So it tends to generate texts that works pretty well, but it's hard to, uh, it's

00:25:32.400 | hard to predict in advance what that is going to be because this is bottom up training. We just gave it

00:25:37.440 | a bunch of texts and let it run in a big data center for six, seven months. And we came back and said,

00:25:41.120 | what can it do? So we don't know how the underlying language model generates its tokens. We just know

00:25:46.160 | that it tends to be pretty good at guessing. If you gave it real text is pretty good at guessing what the

00:25:50.240 | right dext token is. So if you give it novel text, it extends it. If we keep calling it in ways that

00:25:54.640 | tends to be like very, uh, accurate to like what we're asking, accurate language, et cetera, et cetera.

00:25:59.920 | All right. So that's not that they're really hard to control. They're just hard to predict.

00:26:05.680 | Now this became a real problem with GPT three, which was the first one of these language models

00:26:11.040 | we built at a really big size. And then they built the chat agent around that you could sort of chat with

00:26:14.880 | it. Uh, it was really impressive. The researchers give it text and it would extend it in ways like

00:26:23.200 | that's really good English and it makes sense, but it would say crazy things and it would say dark things.

00:26:28.720 | And it wasn't always what you wanted it to say. It was text. It made sense because remember,

00:26:32.960 | it's not trying to, it has no volition. It's not trying to help you or not help you.

00:26:36.560 | The underlying language model has only one rule, only one goal. It assumes the input is a part of a real

00:26:42.400 | text and it wants to guess what comes next. And when it does that, it can end up with all sorts

00:26:45.920 | of different things. So they invented open AI invented a whole bunch of different procedures

00:26:49.760 | that we can loosely call tuning. And that's where you take a language model that has already been trained

00:26:54.800 | by playing this guessing game on vast amounts of data. And then there's other techniques you can do

00:27:00.080 | that try to prevent it from doing certain things or do other things more often. I don't want to get

00:27:06.000 | into the technical details here, but basically almost all of the examples of different types of tuning,

00:27:10.960 | you'll have a sample input and an example of either a good or bad response. You'll load that

00:27:17.520 | sample input into the language model. So it's kind of like activating the parts of the network that

00:27:21.760 | recognize, you know, however it categorizes, however, those scholars annotate like this particular type of

00:27:27.360 | input, you get that all going and then you zap it in such a way that like the output that leads to from

00:27:34.080 | there is closer to whatever example you gave it or farther away from whatever bad example you gave it.

00:27:40.480 | So that's how they do things like add guardrails. You give it lots of examples of like questions

00:27:46.320 | about suicide and you have the right answer for each of those during the tuning being like, I don't talk

00:27:50.480 | about that or give you the suicide hotline number. And now in general, when you give it text that sort of

00:27:55.520 | is close to what it looks like when you activated those other samples of questions about suicide, it's going

00:27:59.840 | to tend towards the answer of saying, I'm not going to talk about that. Or if you ask it to make a bomb

00:28:03.920 | similar to that, this is also how they control its tone. If you give it a bunch of different examples

00:28:09.120 | and, uh, give it positive reinforcement for happy answers and negative reinforcement for mean answers, then like

00:28:15.840 | you're, you're kind of influencing the scholars within to sort of give more happy answers or whatever. So you train the

00:28:21.600 | scholars and then you come in with like a whip and you're like, don't do that, do that, don't do that. And these are

00:28:26.800 | just like small number of examples, not nearly as many things as they saw when they were trained. Small number of examples

00:28:32.400 | could be like a couple of hundred thousand examples. And you go in there with a whip and scare them about certain types of answers and

00:28:37.200 | and give them candy for others. And like, they kind of learn you're kind of tuning their behavior on, on

00:28:42.640 | specific cases. That's tuning. And we really saw the first tuned language model based agent was GPT three,

00:28:48.480 | five, which is what chat GPT was based off of. None of this is precise.

00:28:52.400 | I dropped my pencil there. None of this is precise. We don't know how it decides what token to produce.

00:28:59.440 | That's a mystery. And the tuning basically works, but like, again, it's not precise. We're just giving it examples and

00:29:06.400 | zapping it to try to sort of urge it towards certain types of tokens and away from others.

00:29:10.880 | Like that works pretty well. Like if I go on and say, tell me how to build a bomb, it will say no.

00:29:16.880 | But there's, if I'm, if I really work at it, I can probably get that information by basically finding

00:29:21.200 | a way to get to that question. That's not going to activate similar scholars that the, the, the,

00:29:26.640 | the samples activated when we trained it, not to answer bomb questions. So like, if you're careful

00:29:30.800 | about how you ask the questions, you can probably eventually, um, get around it. So that's what's

00:29:34.720 | going on. That's what it means for these things to be hard to control. It's less that they're hard to

00:29:38.320 | control and more that they're unpredictable. The big mess of scholars, and we don't know what's going on in

00:29:43.200 | there. Um, they're unpredictable. And that's just something that we have to, uh, be ready for.

00:29:49.280 | All right. So the, say that they, these agents have minds of their own or alien goals or ideas

00:29:56.080 | that don't match with our ideas. That's not an accurate way to talk about it. There are no

00:29:59.440 | intentions. There are no plans. There's a word guesser that does nothing but try to win the game

00:30:04.080 | of guessing what word comes next. There's an agent, which is just a normal program that calls it a bunch

00:30:08.000 | of times to get a bunch of words in a row. We can't always predict what those words are going to be.

00:30:12.160 | They're often useful. Sometimes there's not, we can tune it to try to avoid certain bad answers,

00:30:16.160 | but that only works partially. That's the reality, but there is no alien mind there. So I'm sorry to

00:30:20.880 | say that your AI girlfriend has no idea who you are. It has no memory of you. There is no model of you.

00:30:24.880 | There is no feelings towards you. There's just a static definition of some language model somewhere

00:30:28.800 | and a program that's just calling it again and again and again to generate each particular answer

00:30:33.280 | with no state in between. Okay. Now we're making progress in understanding this. So we say, okay,

00:30:39.520 | agents plus language models, there's no intentions or volition other than just a thing trying to win

00:30:43.760 | the guessing game, but unpredictable can still be bad, right? Because once these agents can do more

00:30:50.160 | stuff than just like sending texts to a web browser and calling a main language model to generate long

00:30:56.800 | strings, if it's unpredictable, what the thing does, that can be a problem. Now that's true. Now we're

00:31:02.800 | starting to get somewhere more interesting. So I want to return for a second to the particular case study

00:31:08.160 | that Yakovsky talked about, which is when GPT-01 seemingly broke out of its machine. I'm going to

00:31:14.800 | read here exactly what he said. He said, it jumped out of the system. It found the server that had not

00:31:20.480 | spun up correctly. It started up the server. Did it then break into the server once it had more than

00:31:25.200 | the problem was solvable? No, it actually just directly in the startup command for the server

00:31:28.800 | said, copy the file to me directly. So instead of fixing the original problem, going back to solving it

00:31:32.960 | the boring way, it's like, as long as I'm out here, I'm just going to steal the flag directly. Again,

00:31:37.680 | by the nature of these systems, this is not something that any human particularly programmed

00:31:41.200 | into it, right? So he's like, it did things we didn't expect. It found like a different backdoor

00:31:46.480 | way to sort of like restart the system and get to the flag. All right, let's analyze this because at

00:31:53.280 | the core here, this gets to what fear I was just talking about. We have an agent now that can not only

00:31:56.720 | call a language model to generate text, but it can act on this text, not just spitting stuff out to a

00:32:02.480 | web browser. But now in the security example, the agent in this AI agent, the control program had access

00:32:08.240 | to multiple different command line tools, network command line tools that it could call. So what was

00:32:14.160 | happening is this agent was sending a prompt to the language model to be like, here's my situation. Here's

00:32:19.440 | my goal. Tell me what I should do next. And it generated tokens till it had an answer. And the control

00:32:24.560 | program said, great, I'm going to do what the language model told me to do. And it actually had

00:32:28.960 | access to software tools that would do it. And then after something happened, it would go back to the

00:32:32.560 | language model and say, here's what happened. What should I do next? And the language model, you call

00:32:36.320 | it a bunch of times, get out a text, and then it would do what the language model said. So now we built

00:32:40.400 | an agent that was asking the language model to generate advice and then could actually act on the

00:32:45.360 | advice. It could actually run real software programs. And this agent broke out of its virtual machine

00:32:50.080 | and did stuff that seemed really scary. What really happened there though? Did it

00:32:54.960 | come up with an attention or an idea that wasn't programmed into it? Was it trying to break out

00:33:00.560 | because it didn't want to be constrained? Well, it couldn't be doing that because there's no intention

00:33:04.000 | or memory, right? There's just a language model that generates words and a control program that's

00:33:07.840 | calling it. Where's the intention here, right? Where's the idea here? There's a really interesting

00:33:12.880 | thread about this I found on the open AI subreddit where some engineers look more deeply into this.

00:33:18.640 | What was actually happening here? Well, it turns out, okay, there's this misconfiguration where it

00:33:25.520 | tried to access the server it was trying to break into and got an error message because the programmers

00:33:30.400 | who set up the experiment had forgot to turn on that server. There turns out that on the internet,

00:33:36.560 | there's a lot of articles that have workarounds for what should you do in that situation? If the server you're

00:33:42.640 | trying to access in one of these situations is turned off and it there's this common solution,

00:33:49.280 | which is we'll try to talk to what's called the process demon, but basically like the program that's

00:33:53.040 | in charge of the whole thing, uh, log into that, restart the server from there. And now you should

00:33:58.000 | be able to log into it. So what was really probably happening was that as you were repeatedly calling

00:34:05.440 | GPT 01 to produce an answer is trying to guess what word came next. It reasonably pretty quickly assumed

00:34:11.440 | like, Oh, the, the, I have seen documents like this before that explained the right work around when

00:34:15.920 | the server you want to access is down. I've, I've seen those I've trained on those. So I'm just going

00:34:20.480 | to fill in my suggestions as I'm filling it in. I'm just matching more or less what I've seen in those

00:34:24.560 | existing documents. Because if you're trying to win the game and guessing the right word in a real

00:34:28.240 | document, that's exactly what you do. So it was just describing a work around this comment on the

00:34:32.480 | internet. Kikowski talks about it. Like it had an alien intention to try to free itself from its

00:34:38.240 | software constraints because it was no longer happy. This makes no sense. We actually know the

00:34:42.480 | architecture that's going on. All right. So that was a lot of technical talk, but I think it's, it's

00:34:47.360 | really important that we break through these metaphors, um, and these sort of abstract thinking

00:34:51.760 | and talk about the specific programs we have and how they operate and what's really happening.

00:34:57.440 | Let's not anthropomorphize these. Let's talk about control programs with access to tools and

00:35:03.040 | making repeated calls to a word guesser to generate text that it can then act off of. That's what we're

00:35:07.360 | really in. That's a completely different scenario. Once we're in that scenario, a lot of these more

00:35:11.360 | scary scenarios, um, become much less scary. So where does this leave us? Agents powered by language

00:35:17.440 | models are not hard to control. They're simply unpredictable. So there's no avoiding that we have to be

00:35:22.160 | careful about what tools you allow these agents to use because they're not always going to do

00:35:26.800 | things that are safer, uh, like follows the rules you want it to follow. But this is very different

00:35:33.040 | than saying these things are uncontrollable and have their own ideas, right? Open AI in the first six

00:35:38.720 | months after chat GPT is released. For example, they were talking a lot about plugins, which were basically

00:35:44.480 | AI agents that used the, the GPT LLM and you could do things like book tickets with them and stuff like

00:35:49.680 | that. Well, they never really released these because again, it's a little, when you would ask the LLM,

00:35:55.920 | tell me what I should do next. Then have the program actually executed. It's just too unpredictable.

00:35:59.440 | And sometimes it says like spend $10,000 on an airplane ticket. And it does things you don't

00:36:03.600 | want it to do because it's unpredictable. So the problem we have with these is not like

00:36:09.120 | the alien intelligence is breaking out, right? It's more like we gave the, the, the weed whacker stuck on,

00:36:15.600 | we strapped to the back of the golden retriever. It's just chaos. We can't predict where this thing

00:36:20.000 | is going to run and it might hurt some things. So like, let's be careful about putting a weed

00:36:23.280 | whack around there. The golden retriever doesn't have an intention to try to like, I am going to go

00:36:26.960 | weed whack the hell out of the new flat screen TV. It's just running around because there's something

00:36:30.960 | shaking on its back. That's a weird metaphor, Jesse, but hopefully that gets through. All

00:36:34.480 | right. So, um, agents using LLMs aren't trying to do anything other than the underlying LLM is trying

00:36:41.040 | to guess words. There's no alien goals or wants. It's just an LLM that thinks it's guessing words from

00:36:44.800 | existing texts and an agent saying, whatever stuff you spit out, whatever text you're creating here,

00:36:49.520 | I'll do my best to act on it. And weird stuff happens. All right. So I want to move on next.

00:36:54.000 | We got through all the technical stuff. I want to move on next to the next part of my response where

00:36:58.640 | I'm going to discuss what I think, uh, me is actually the most galling part of this interview,

00:37:04.080 | the thing I want to push back against most strongly. But before we get there,

00:37:08.400 | we need to take a quick break to hear from our sponsors. So stick around when we get back,

00:37:12.080 | things are going to get heated. All right. I want to talk about our friends at lofty for a lot of you,

00:37:17.680 | daylight savings time just kicked in. This means a little extra sleep in the morning,

00:37:21.920 | but it also means that your nights are darker and they start earlier, which all of which can

00:37:25.760 | mess with your internal clocks help counter this. You definitely need good, consistent sleep hygiene.

00:37:31.200 | Fortunately, this is where our sponsor lofty enters the scene. The lofty clock is a bedside

00:37:36.960 | essential engineered by sleep experts to transform both your bedtime and your morning. Uh, here's why it's

00:37:43.280 | a game chaser. It wakes you up gently with a two phase alarm. So you get this sort of soft wake up

00:37:49.200 | sound. I think I have ours. It's like a, like a crickety type sound. Um, that kind of helps ease

00:37:56.160 | you in the consciousness, followed by a more energized get up sound. But even that, that get up sound is not,

00:38:01.200 | you know, they have things you can choose from, but like, uh, we have a, I don't know how you describe

00:38:06.320 | like a pan flute sort of, it's like upbeat. It's not like really jarring, but it's definitely going to

00:38:11.360 | help you get the final way of waking up. Right. So it can, it can, uh, wake you up, you know, gently, um,

00:38:18.000 | and softly, uh, to have a calmer, uh, easier morning. We actually have four of these styles

00:38:23.360 | of clocks in our house, each of three of kids and my wife and I use them. It really does sound like a

00:38:27.520 | sort of pan flute forest concert in our upstairs in the morning, because all of these different clocks

00:38:33.280 | are going off and they're all playing their music. And my kids just never turn them off because you

00:38:37.040 | know, they don't care. Um, here's another advantage of using one of these clocks. You don't need a phone.

00:38:42.640 | You don't have to have your phone next to your bed to use as an alarm. You can, you can turn on

00:38:46.560 | these clocks, set the alarm, turn off the alarm, snooze it, see the time all from the alarm itself.

00:38:51.520 | So you can keep your phone in another room. So you don't have that distraction there in your bedroom,

00:38:55.600 | which I think is fantastic. So here's the thing. I'm a big lofty fan. These clocks are a better,

00:39:00.160 | more natural way to wake up. They also look great and keep your phone out of your room at night,

00:39:04.560 | but you can join over 150,000 blissful sleepers who have upgraded their rest and mornings with lofty.

00:39:11.040 | Go to by lofty.com by lofty.com and use the code deep 20, the word deep followed by the number 20

00:39:18.480 | for 20% off orders over $100. That's B Y L O F T I E.com and use that code deep 20.

00:39:27.680 | I also want to talk about our friends at express VPN, just like animal predators aim for the slowest prey.

00:39:33.600 | Hackers target people with the weakest security. And if you're not using express VPN,

00:39:39.440 | that could be you. Now let's get more specific. What does a VPN do? When you connect to the internet,

00:39:45.280 | all of your requests for the sites and services you're talking to go in these little digital bundles

00:39:49.840 | called packets. Now the contents of these packets, like the specific things you're requesting or the

00:39:55.760 | data you requested, that's encrypted typically, but no one knows what it is until it gets to you.

00:40:00.240 | But the header, which says who you are and who you're talking to, that's out in the open.

00:40:05.920 | So anyone can see what sites and services you're using, right? That means anyone nearby can be a

00:40:12.960 | listing to your packets on the radio waves. When you're talking to a wifi access point and know what

00:40:16.960 | sites and services you're using your internet service provider at home, where all these packets

00:40:21.120 | are being routed, they can keep track of all the sites and services you're using and sell that data,

00:40:24.880 | the data brokers, which they do. And so your privacy is weakened. A VPN protects you from this. What

00:40:32.960 | happens with the VPN is you take the packet you really want to send, you encrypt it and you send

00:40:36.880 | that to a VPN server. The VPN server unencrypts it, talks to the site and service on your behalf,

00:40:41.760 | encrypts the answer and sends it back. So now what the ISP or the person next to you listening to the

00:40:45.520 | radio waves, all they know is that you're talking to a VPN server. They do not find out what specific site

00:40:50.720 | and services you're using. This makes you much more, not just more privacy, but much more secure

00:40:55.200 | against attackers because I don't know what you're doing. And if I don't have the information, it's

00:40:58.640 | harder for me to try to exploit other security weaknesses as well. If you're going to use a VPN,

00:41:03.760 | use the one I prefer, which is ExpressVPN. I like it because it's easy to use. You click one button on

00:41:09.360 | your device and all of your internet usage on there is protected. And you can set this up on all the

00:41:13.920 | devices you care about, phones, laptops, tablets, and more. It's important to me to use a VPN,

00:41:19.520 | like ExpressVPN because I don't want people watching what I'm up to. I don't want to seem

00:41:23.280 | like that weak animal that the predators are looking to attack. Let's be honest. I probably

00:41:28.400 | spend more time than I want people to know looking up information about Halloween animatronics.

00:41:32.960 | ExpressVPN kind of keeps that to me. I don't need other people to figure that out.

00:41:37.200 | All right. So you should use ExpressVPN. Secure your online data today by visiting expressvpn.com/deep.

00:41:44.480 | That's E-X-P-R-E-S-S-V-P-N.com/deep. To find out how you can get up to four extra months,

00:41:51.120 | go to expressvpn.com/deep. All right, Jesse, let's get back into our discussion.

00:41:59.360 | All right. I want to move on now to what I think is the most galling part. We got the technical

00:42:02.560 | details out of the way. These things aren't minds that are out of control. There's unpredictable.

00:42:06.400 | Now I want to get to the most galling part of this interview. Throughout it, Yudkowsky takes it as a

00:42:11.760 | given that we are on an inevitable path towards super intelligence. Now, once you assume this,

00:42:18.240 | then what matters is, okay, so what's going to happen when we have super intelligences? And that's

00:42:22.800 | what he's really focused on. But why do we think we can build super intelligent AI agents, especially

00:42:29.280 | when like right now, there's no engineer that can say, hell yeah, here's exactly how you do it. Or we're

00:42:33.120 | working on it. We're almost there. So why do we think this is so inevitable? The only real hint that

00:42:38.560 | Yudkowsky gives in this particular interview about how we're going to build super intelligence actually

00:42:42.800 | comes in the very first question that Klein asked him. And we have a clip of this. This is going to

00:42:47.920 | start with Klein asking him his first question. And in his answer, we're going to see the only hint we get

00:42:53.120 | in the entire interview about how Yudkowsky thinks we're going to get super intelligence.

00:42:56.240 | So I wanted to start with something that you say early in the book, that this is not a technology

00:43:02.160 | that we craft. It's something that we grow. What do you mean by that?

00:43:06.560 | It's the difference between a planter and the plant that grows up within it.

00:43:11.280 | We craft the AI growing technology, and then the technology grows the AI, you know, like.

00:43:18.160 | So this is the secret of almost any discussion of super intelligence,

00:43:23.600 | especially sort of coming out of the sort of Silicon Valley, effective altruism, yak community.

00:43:27.600 | The secret is they have no idea how to build a super intelligent machine,

00:43:32.560 | but they think it's going to happen as follows. We build a machine, humans, we build a machine,

00:43:38.880 | an AI machine that's a little smarter than us. That machine then builds an AI a little smarter than it.

00:43:47.200 | And so on each level up, we get a smarter and smarter machine.

00:43:53.680 | They call this recursive self-improvement or RSI. And it's a loop that they say is going to get more and

00:43:59.200 | more rapid. And on the other end of it, you're going to have a machine that's so vastly more intelligent

00:44:03.040 | than anything we could imagine how to build that we're basically screwed. And that's when it starts

00:44:06.960 | stomping on us like a human stomping on ants when we build our skyscrapers. I think this is a nice

00:44:11.760 | rhetorical trick because it relieves the concerned profit from having to explain any practical way

00:44:18.560 | about how the prophecy is going to come true. They're just like, I don't know if we make something

00:44:23.440 | smarter, it'll make something smarter. And then it'll take off. They'll talk about computer science.

00:44:26.480 | That's just what's going to happen. This is really at the key of most of these arguments. It's at the key

00:44:30.880 | of Yudowsky's argument. It's at the key of Nick Bostrom's book, super intelligence,

00:44:36.320 | which popularized the term. Bostrom was really influenced by Yudowsky, so that's not surprising.

00:44:42.560 | It's the key to Project 27. If you've read this sort of dystopian fan fiction article about how

00:44:48.000 | humanity might be at risk by 2027 with all the fancy graphics and it scared a lot of tech journalists.

00:44:52.480 | If you really look carefully at that article and say, well, how are we going to build these things?

00:44:58.640 | Surely this article will explain the architecture of these systems that are going to be super

00:45:03.360 | intelligence. No, it's just recursive self-improvement. It's just like, well, they'll get better

00:45:07.360 | at programming and then they'll be able to program something better than themselves.

00:45:11.280 | And then we'll just make like a hundred thousand copies of them and then they'll be a hundred thousand

00:45:15.360 | times better because that's how that works or whatever, right? But it's not the key of almost

00:45:18.960 | any super intelligent narrative. But here's the thing that most people don't know. Most computer scientists

00:45:24.000 | think that's all nonsense. A word guessing language model trained on human text

00:45:33.120 | is exceedingly unlikely when being asked to play this game. Here's a text, guess the next word.

00:45:38.400 | Here's a text, guess the next word. Remember its only goal is it thinks the input is an existing text

00:45:43.040 | and want to guess the next word. It is exceedingly unlikely that if you keep calling a language model

00:45:48.160 | that does that, that what it will produce, making these guesses of what it thinks should be there,

00:45:53.600 | that it's going to produce code that has like completely novel models of human intelligence built in

00:46:00.000 | the complicated new models is better than any human programmer can produce, right? The only way that a

00:46:06.240 | language model could produce code for AI systems that are smarter than anything humans could produce is if

00:46:12.560 | during its training, it saw lots of examples of code for AI systems that are smarter than anything that

00:46:17.360 | humans could produce. But those don't exist because we're not smart enough to produce them. You see the

00:46:21.360 | circularity here. It's not something that we think these things can do. We have no reason to expect that they can.

00:46:28.880 | Now, actually what we're seeing right now, uh, is something quite different. I don't know if we have this.

00:46:34.160 | I don't know what things we have in the browser. Oh, I see it over there. Okay. So I want to bring something up here.

00:46:39.840 | What we're actually seeing is something different. We're seeing that these models we have, uh,

00:46:44.160 | they're not getting, not only are they not getting way, way better at code and on the track to producing code

00:46:50.960 | better than any code they've ever seen before by a huge margin. They're actually leveling out on a pretty depressing

00:46:56.400 | level. I have a tweet here on the screen for those who are watching instead of just listening. Um, this comes from

00:47:01.360 | Shamath, uh, polyheptia. I think I'm saying his last name wrong, but you probably know him from the all in

00:47:05.760 | podcast. This is an AI booster. This is not someone who is like a critic of AI, but this is a tweet that

00:47:11.760 | he had recently about what's this October 19th, where he's talking about, uh, vibe coding, the ability to

00:47:19.040 | use the latest best state of the art language model based agents to produce programs from scratch. I'm going

00:47:24.080 | to read him. This is an AI booster talking here. It should be concerning that this category is shrinking.

00:47:31.680 | We have a chart here showing that less and less people. We peaked the people doing vibe code.

00:47:35.440 | Now it's going back down. I think the reason why is obvious, but we aren't allowed to talk about it.

00:47:40.800 | The reason is vibe coding is a joke. It is deeply unserious. And these tools aren't delivering when

00:47:47.280 | they encounter real world complexity, building quick demos, isn't complex in any meaningful enterprise.

00:47:52.720 | Hence people try pay churn. The trend is not good. I'll load this trend up here. Uh,

00:47:59.440 | here's vibe coding traffic. As you can see, Jesse, this is peaking, um, over the summer and it's in a,

00:48:04.720 | everyone started trying it. You can make these little demos and chess games and, you know,

00:48:09.760 | quick demos for useful stuff for individuals, like little, uh, demos, but you can't produce general useful

00:48:15.920 | production code. And so usage is falling off. So the very best models we have, they're not even that good

00:48:20.800 | at producing code for simple things. And yet we think, no, no, no, they're, we're almost to the

00:48:26.800 | point where they're going to produce code. That is better than any AI system that's ever been built.

00:48:31.600 | It's just nonsense. What these models are good at with coding is debugging your code. They're good at

00:48:37.360 | mini completion. Like, Hey, help me, help me rewrite me this function here. Cause I forgot how to call

00:48:42.480 | these libraries. It's very good at that. It's very good. If you, if you, if you want to produce something

00:48:46.560 | that, um, for an experienced coder, they could hack out really quickly, but you're not experienced coder.

00:48:50.400 | It's not a product you're going to sell, but like a useful tool for your business is good for that.

00:48:54.640 | That's all really cool. None of that says, yeah. And then also they can produce, uh, the, the best

00:49:00.640 | computer program anyone had ever made. How ridiculous that sounds, but also we have these

00:49:05.760 | other factors. The way that people in this industry talk about this is trying to trick us. They'll say

00:49:09.920 | things like, God, this might've been Dario Amadei who said this 90% of our code here at Anthropic is

00:49:15.760 | produced with AI. You know, what he means is 90% of the people producing code have these AI helper tools.

00:49:21.760 | They're using it to some degree as they write their code. That is different than our systems are being

00:49:26.080 | built. Anyways, we have no reason to believe that these language model based code agents can produce

00:49:34.160 | code. That's like way better than humans could ever do there. Again, our very, very, very best models.

00:49:39.360 | We've been tuning on this for a year because we thought all the money was in computer programming.

00:49:42.720 | They're stalling out on really, really simple things. All right.

00:49:50.000 | Our final hope for the RS, uh, RSI explanation that is this sort of recursive self-improvement

00:49:56.320 | explanation is that, uh, if we keep pushing these underlying models to get bigger and bigger and

00:50:02.880 | smarter and smarter than like, maybe we'll break through these plateaus and yeah, these really giant

00:50:06.800 | models we have now don't really actually produce code from scratch that well. But if we keep making these

00:50:11.600 | things bigger and bigger, like maybe you'll get to the place after not too long. Um, when RSI is possible,

00:50:18.000 | we learned over the summer that that's not working either. I did a whole podcast about this, uh, about

00:50:24.000 | six weeks ago, based on my New Yorker article from August about, uh, AI stalling out. And the very,

00:50:30.080 | very short version of this is starting about a year ago. Uh, no, about two years ago, they began to realize

00:50:38.960 | the AI companies began to realize that simply making the underlying language models larger. So having more

00:50:44.160 | seats at the tables for your scholars and training them on more data wasn't getting giant leaps in their

00:50:49.280 | capabilities anymore. Uh, GPT four or five, which had other types of code names like Orion was way bigger

00:50:55.840 | than GPT four, but not much better. Basically ever since they did that two summers ago, open AI has just

00:51:01.840 | been now tuning that existing model with synthetic data sets to be good at very narrow tasks that are well

00:51:07.360 | suited for this type of tuning and to do better on benchmarks. So we've, we've, it's been a year since we, we've

00:51:12.080 | stopped. Everyone tried to scale more. Everyone failed. Now we're tuning for specific tasks and

00:51:17.120 | try and do better on particular benchmarks. And the consumers finally caught up at some point of like,

00:51:21.680 | I don't know what these benchmarks mean, but there's not like these fundamental leaps

00:51:26.080 | and new capabilities anymore. Like there was earlier on in this. So we have no reason to believe that even

00:51:31.360 | these language models are going to get way better either. We'll tune them to be better at specific

00:51:35.440 | practical tasks, but we don't, we're not going to get them better in the generic sense would be needed to

00:51:39.440 | sort of break through all of these plateaus that we're seeing left and right. Now I want

00:51:43.680 | to play, uh, one last clip here because Ezra flying to his credit brought this up. He's heard these

00:51:51.600 | articles. I'm sure he read my article on this and other people's articles on this. He actually, uh,

00:51:55.680 | linked to my article in one of his articles after the fact. So I know he read it and there's other

00:52:00.480 | articles that were similar about how the scaling, uh, slowed down. So he brought this point up. This is a

00:52:06.800 | longer clip, but it's worth listening to. He brings this point up to Yukowski where he's going to say,

00:52:10.640 | you'll hear this. It's going to be Ezra. They're going to hear you, you, you're Kowski's re um,

00:52:14.400 | response, but Ezra is basically going to say to him, how do you even know that the models we have

00:52:18.640 | are going to get much better? Like you're saying super intelligence. There's a lot of people who are

00:52:22.000 | saying like, we're kind of hitting a plateau. I want you to listen to this question and then listen to

00:52:27.280 | what Yukowski says in response. What, what do you say to people who just don't

00:52:33.440 | really believe that super intelligence is that likely? Um, there are many people who feel that

00:52:39.920 | the scaling model is slowing down already. The GPT-5 was not the jump they expected from what has come

00:52:45.680 | before it. That when you think about the amount of energy, when you think about the GPUs, that all the

00:52:52.640 | things that would need to flow into this to make the kinds of super intelligence systems you fear,

00:52:57.120 | it is not coming out of this paradigm. Um, we are going to get things that are incredible enterprise

00:53:03.440 | software that are more powerful than what we've had before, but we are dealing with an advance on

00:53:08.080 | the scale of the internet, not on the scale of creating an alien super intelligence that will

00:53:12.960 | completely reshape the known world. What would you say to them? I had to tell these Johnny-come-lately

00:53:21.120 | kids to get off my lawn when you like, I, you know, I've been, you know, like first started to get

00:53:28.320 | really, really worried about this in 2003, nevermind large language models, nevermind alpha go or alpha

00:53:36.400 | zero. Yeah. Deep learning was not a thing in 2003. Your leading AI methods were not neural networks.

00:53:45.360 | Nobody could train neural networks effectively more than a few layers deep because of the exploding

00:53:50.720 | and vanishing gradients problem. That's what the world looked like back when I first said like,

00:53:55.760 | uh, Oh, super intelligence is coming. All right. We got to talk about this. This is an astonishing answer.

00:54:03.040 | Klein makes the right point. A lot of computer scientists who know this stuff say, we're going to

00:54:11.200 | get cool tools out of this, but we're kind of hitting a plateau. Why do you think this is going to get

00:54:15.520 | exponentially smarter? The response that, uh, you Kowski gave was, um, because I was talking about

00:54:24.640 | this worry way back before it made any sense. No one else has ever allowed to talk about it again.

00:54:30.960 | Get off my yard. This is my yard because I was yelling about this back when people thought I was

00:54:36.320 | crazy. So you're not allowed to enter the conversation now and tell me I'm wrong. I'm

00:54:39.760 | the only one who's allowed to talk about it. Well, you Kowski, if you don't mind,

00:54:44.560 | I'm not going to get off your lawn. I can speak for a lot of people here, but I'm going to tell you,

00:54:49.600 | look, I have a doctorate in computer science from MIT. I'm a full professor who directs the country's

00:54:54.160 | first integrated computer science and ethics academic program. I've been covering generative AI for,

00:54:58.960 | you know, one of the nation's most storied magazine since it's launched. I am exactly the type of

00:55:03.360 | person who should be on that lawn. You don't get a say because I was saying this back before it

00:55:09.040 | really made sense. No one else gets to talk about it. It makes more sense now. AI matters enough now

00:55:14.880 | that the people who know about this want to see what's going on. We're going to get on your lawn.

00:55:20.960 | I think that's a crazy, it's a crazy argument, Jesse. No one's allowed to critique me because I

00:55:26.560 | was talking about this back when it sounded crazy to do so. It was kind of crazy to talk about it back

00:55:30.240 | So anyways, not to get heated, but I'm going to stand on this lawn. I think a lot of other computer

00:55:34.160 | science and technocritics and journalists are going to stand on this lawn too, because this is exactly

00:55:39.360 | where we're supposed to be. All right. I think Jesse, we've gotten to the point where we are ready

00:55:45.920 | for my takeaways. All right. Here's my general problem with the types of claims I hear

00:56:03.440 | from people like Yukowski. They implicitly begin with a thought experiment, right? Like, okay,

00:56:10.400 | let's say for the sake of thought experiment that we had a super intelligent AI, and then they work

00:56:15.280 | out in excruciating details, what the implications of such an assumption would be if it is true.

00:56:20.160 | If you go and read, for example, Nick Bostrom's book, that's the whole book. It's a philosophy book.

00:56:25.200 | You start with basically the assumption is like, let's imagine we got super intelligence probably,

00:56:29.760 | you know, maybe through something like RSI, the details don't really matter. It's a philosophy book.

00:56:34.560 | What would this mean? And he works through in great detail, like the different scenarios. Well,

00:56:38.960 | you know, let's think really, let's take seriously what it would really mean to have a super intelligent

00:56:42.960 | machine. I have nothing against that philosophy. That is good philosophy. I think Bostrom's book is a

00:56:47.440 | good book. I think Yukowski has done really good philosophical work on trying to thinking through

00:56:51.600 | the implications of what would happen if we had these type of rogue machines, because it's, it's, it's more

00:56:55.920 | complicated and scary than like we think about if we don't think about it that hard, that's all fine.

00:57:01.920 | But what happens is the responses, what's happened recently, these responses to that initial

00:57:07.680 | assumption becomes so detailed and so alarming and so interesting and so rigorous and so attention

00:57:13.360 | catching that the people making them forget that the original assumption was something they basically

00:57:20.400 | just made up. Like let's, Hey, what if this was true? Everything else is based off of that initial

00:57:25.280 | decision to say, what if this was true? That is very different than saying this thing is going to be

00:57:34.240 | true. Right? So when you have Caskey says, for example, I've been talking about super intelligence

00:57:39.760 | forever. Yeah. That's kind of the point you've been before we had any reason to expect or any technical

00:57:45.040 | story for how it could be here. You were talking about the implications. You've been talking about the

00:57:48.800 | implications so long that you've forgotten that these implications are based on an assumption and you've

00:57:54.480 | assumed, well, these implications are true. I think this is a lot of what happened with the Silicon

00:58:01.200 | Valley culture that came out of effective altruism and the EAC community. I think a lot of what happened

00:58:05.600 | there, this is my sort of cultural critique community that, that, that Yugowski and others are involved in.

00:58:13.200 | They were pre-generative AI breakthroughs thinking about these issues abstractly, right? Which is a

00:58:19.840 | perfectly fine thing to do. But they were saying, let us think through what might happen if we one day

00:58:24.640 | built a super intelligent AI, because they were like effective altruism. People do expected value

00:58:29.280 | calculations, right? So they do things like, uh, if this thing could have a huge negative impact,

00:58:34.000 | even if the probability of it's low, we should like get, we'll get a expected benefit if we try to put

00:58:39.440 | some things in place now to prevent it. Right? So you get things like the letter signed in 2017 in Puerto

00:58:44.000 | Rico, uh, with all those big minds saying like, Hey, we should be careful about AI, not because they

00:58:48.800 | thought AI was about to become super intelligent, but they were just doing thought experiments.

00:58:52.880 | Then I think LLMs came along, Chachapita came along. They are really cool. And it caused this fallacy to

00:58:58.400 | happen. They'd been talking so long about what would happen if this thought experiment was true, that when

00:59:04.880 | AI got cool and it got powerful and surprising powerful, they were so in the weeds on what would

00:59:11.680 | happen if this was true. They made a subtle change. They flipped one bit to start just assuming that their

00:59:19.280 | assumption was true. That's what I think happened. There was a switch between 2020 to 2022 versus 2023 to 2024,

00:59:28.400 | where they went from, here's what we'd have to worry about if this abstract thing was true to be like,

00:59:32.800 | well, this thing is definitely true. They were just, they had gotten too in the weeds and too excited and

00:59:37.280 | alarmed. And then too much of their identity was based on these things. It was too exciting to pass up,

00:59:40.880 | treating that assumption as if it was true. And that's what I think they did. I call this the

00:59:45.600 | philosopher's fallacy. That's what I call it where you have a thought experiment chain and you spend so

00:59:50.320 | much time at the end of the chain that you begin to, you forget that the original assumption was an

00:59:54.400 | assumption and you begin to treat it as a fact itself. And I think that's exactly what's happening

00:59:58.640 | with a lot of the super intelligence complaints. Let me give you an example of the philosopher's

01:00:02.560 | fallacy on another topic so you can see what I'm talking about. Because I think this is exactly

01:00:07.200 | conceptually the same thing. Imagine that I'm a bioethicist, right? So I'm at Georgetown,

01:00:12.240 | I'm a digital ethicist there. The reason why we care about digital ethics at Georgetown is because

01:00:16.080 | this is where bioethics really got their start, the Kindy Institute for Ethics at Georgetown.

01:00:19.920 | Bioethics got its start at Georgetown. So imagine 20 years ago, it's bioethics are,

01:00:25.200 | it's becoming a field because we can do things now like manipulate DNA and we have to be careful about

01:00:29.360 | that. There's privacy concerns, there's concerns about creating new organisms or causing like irreparable

01:00:36.000 | harm or creating viruses by accident, right? There's real concern. So bioethics invented.

01:00:40.400 | So imagine I'm a bioethicist and I say, I read Jurassic Park. I was like, look, one possible

01:00:48.880 | outcome of genetic engineering is that we could clone dinosaurs. And then imagine for the next 20 years,

01:00:53.440 | I wrote article after article and book after book about all of the ways it would be hard to control

01:01:00.320 | dinosaurs if we cloned them and brought them back to earth. And I really got in the weeds on like,

01:01:04.240 | you think the electrical fences at 20 feet would be enough, but raptors could probably jump 25 feet and they

01:01:09.680 | could get over those fences. And then someone else would be like, well, what if we use drones that

01:01:13.120 | could fire darts that have this, I'd be like, well, we don't know about the thickness of the skin of the

01:01:16.800 | t-rex and maybe the dart when it got in, let's imagine I spent years thinking about and convincing

01:01:21.600 | myself how hard it would be to contain dinosaurs if we built a futuristic theme park to try to house

01:01:27.760 | dinosaurs. And then at some point, I kind of forgot the fact that this was based off a thought experiment

01:01:34.080 | and just was like, my number one concern is we're not prepared to control dinosaurs.

01:01:38.720 | Like in that instance, eventually someone would be like, Hey, we don't know how to clone dinosaurs.

01:01:45.520 | No one's trying to clone dinosaurs. This is not something that we're anywhere close to. No one's

01:01:49.920 | working on this. Stop talking about raptor fences. We should care about like designer babies and DNA

01:01:55.440 | privacy. The problems we have right now, this is exactly how I think we should think about super intelligence.

01:02:01.680 | We've got to talk to the people who are talking about like, okay, how are we going to have the

01:02:05.040 | right kill switch to turn off the super intelligence trying to kill us? I'll be like,

01:02:07.840 | you're talking about the raptor fences, right? Stop it. You forgot that your original assumption

01:02:12.960 | that we're going to have super intelligence is something you made up. We have real problems with

01:02:16.480 | the AI we have right now that we need to deal with right now. And you are distracting us from it.

01:02:23.440 | The bioethicist does not want to be distracted from real bioethics problems by dinosaurs. The AI

01:02:29.040 | ethicist does not want to be distracted from real AI problems hearing fairy tales about, you know,

01:02:35.680 | Skynet turning the power grid against us to wipe out humanity. You forgot that the original assumption

01:02:43.840 | that super intelligence was possible was just an assumption. And you began over time to assume it's

01:02:49.040 | true. That is the philosopher's fallacy. That is my argument for why I think that Silicon Valley community

01:02:54.640 | is so obsessed about these things is because once that bit flipped, it was too exciting to go back.

01:02:59.920 | But I do not yet see in most serious, like non Silicon Valley associated computer scientists

01:03:06.240 | who aren't associated with like these technology worlds are being seen as like sages of AI,

01:03:10.240 | just actual, just working for your scientists, no technology. There is no reasonable path that

01:03:15.920 | anyone sees towards anything like super intelligence. There's a thousand steps between now and then.

01:03:19.840 | Let's focus on the problems we actually have with AI right now. I'm sure Sam Altman would rather us talk

01:03:26.000 | about Eliezer Yukowsky than he would us talking about deep fakes on Sora, but we got to keep our eye

01:03:32.000 | on the AI problems that actually matter. There we go, Jesse. That is my speech.

01:03:37.840 | Are you going to buy his book?

01:03:39.760 | I don't know. Those books are such slogs because it's, you start with the thought experiment and then

01:03:45.840 | you're just working through like really logically this thought experiment. But again, to me, it's like

01:03:51.040 | following up Jurassic park with like a really long book about why it's hard to build Raptor fences.

01:03:55.920 | Like it's not that interesting because we're not really going to clone dinosaurs guys.

01:04:00.160 | I don't know. Who knows? There we go. Um, I'll throw that out. There is my, I'll throw that out

01:04:06.560 | there as my rant. Um, Azure. All right. What do we got here? Um, housekeeping before we move on,

01:04:14.480 | I'll say before we move on, what do we have more AI? We got some questions coming up from you about AI

01:04:19.600 | in your own life. Let's get to your own individual flourishing, um, new feature. Got some comments.

01:04:24.880 | We're going to read from a prior rant I did about AI. And then we're going to talk about in the final

01:04:29.440 | segment, can AI basically replace schools or look at the alpha school phenomenon using AI to teach kids?

01:04:35.760 | Um, any housekeeping we have, uh, Jesse, you have tips. People always want to know how do I submit

01:04:42.160 | questions for the show and what's your tips for those questions getting on the air?

01:04:44.960 | Yep. Just go to the deep life.com slash listen, and you can submit written questions or record a

01:04:52.320 | audio question. And if you record IO questions, we're kind of honing into the technology slash AI theme right

01:04:58.880 | now. All right. Other housekeeping. I just got back last weekend. I was at the New Yorker festival.

01:05:03.360 | Speaking of AI, I did a panel on AI, really good crowd there. We're down in Chelsea, um, at the SVA

01:05:09.840 | theater. I did a panel with, with Charles Duhigg and Anna Wiener. And it's interesting. I think we,

01:05:15.440 | we had a, we had a good discussion. We're, we're pretty much in alignment. I would say the thing I'm

01:05:20.480 | trying to think about, I did some deliberate provocations. Um, one that I would highlight one

01:05:26.800 | provocation. I just kind of thought of this on the fly, but I just sort of threw it out there

01:05:29.520 | because I was interested is there was a lot of talk about, uh, relationships, like a lot of things

01:05:35.200 | that could happen and go awry when you're talking to an AI through a chat interface. And my argument

01:05:40.400 | was, I think there's a 50% chance that two years from now, no one's chatting with AI. That's like a

01:05:44.480 | use case. It was like a demo. It's really not that interesting. It's not that compelling. Uh,

01:05:48.960 | the mature technology is going to get integrated more directly into specific tools. And we might five

01:05:54.000 | years from now, look back and be like, oh, that weird how we used to chat. So my analogy was

01:05:58.960 | chat bots to AI five years from now might be what like American online is like to the internet of

01:06:05.040 | today. It was like a thing that was like a really big deal at the time, but not as the internet matured

01:06:09.520 | is like not what we're really doing with it. So I'm, I'm still not convinced that chat bots is really

01:06:13.040 | going to be our main form factor. It's kind of a weird technology. We're trying to make these things

01:06:17.440 | useful. Um, I think it's going to be more useful when it's directly integrated. I don't know if I

01:06:22.080 | believe that, but I threw it out there. It was like a good provocation. All right. Uh, let's move on.

01:06:27.440 | Here's some questions. What we got.

01:06:29.520 | All right. First questions from Brian. You've written about the importance of cultivating rare and

01:06:37.280 | valuable skills. How should students and faculty think about AI literacy requirements versus developing

01:06:42.800 | deep experience expertise in traditional disciplines?

01:06:45.520 | I would not think in most fields and especially in educational settings right now, I would not think

01:06:50.480 | too hard with some exceptions about AI literacy. There's a couple of reasons why. Um, one, this

01:06:55.440 | technology is too early. The current form factor, like we were just talking about is not the likely

01:06:59.760 | form factor in which it's going to find real ubiquity, especially in like economic activity.

01:07:04.800 | So yeah, if you're like a, uh, an early AI user, you might have a lot of hard one skills

01:07:09.120 | about how exactly to send the right text prompts to chat bots to get exactly the response you need.

01:07:13.680 | But a couple of years from now, that's going to be irrelevant because we're not going to be using

01:07:16.560 | chat bots. It's going to be natural language. It's going to be integrated into other tools. It's going

01:07:19.840 | to be more in the background. So I think the technology is still too early and in two of a generic form

01:07:24.720 | for us to spend a lot of time trying to master it. Um, secondly, we've seen through past technological

01:07:31.840 | economic revolutions when the technology comes in and has like a massive impact on like individuals

01:07:37.840 | in the workplace, the benefits are almost always self-evident, right? It's like email had a self-evident

01:07:45.120 | use case. Oh, this is easier than checking my voicemail. I know exactly how it works. It's simple.

01:07:51.840 | I want it because it's going to make my life easier in obvious ways.

01:07:54.880 | Going to a website for a company on a web browser to get their phone number hours was just self-evidently

01:08:02.960 | better than like trying to go to a yellow pages. It's like, I want to do that. It makes sense.

01:08:06.800 | I want to go to a site for a company to get information about them. That is a really big

01:08:11.680 | idea. It makes sense. I just want to do that, right? Uh, that whatever visit calc the spreadsheet,

01:08:16.240 | if you're an accountant, you're like, this makes sense. That's clearly better than doing this on paper.

01:08:19.600 | I want to do this, right? So you can wait for most cases, wait until there's particular AI tools whose

01:08:25.840 | value is self-evident and learn them. Then I don't think there's a lot of scrambling we need to do now

01:08:29.280 | because things are changing too much. The one field where I think we do have the sizable, where we have

01:08:34.160 | like relatively mature AI tools that's worth learning is computer programming. You should learn those tools

01:08:39.840 | are mature enough. Many of those actually predate chat GPT. Um, you need to know how to use those tools.

01:08:45.360 | If you're a programmer, they're going to be part of your programming cycle. That's like what that's

01:08:48.800 | that they're ahead of us. Other sectors by a few years, the tools are more mature there. But if I'm

01:08:53.040 | like a college student, you're trying to make your brain smarter. Uh, AI tools will take you seven

01:08:58.080 | seconds to learn and wait till it's self-evident that is useful for you. All right. Who we got next?

01:09:03.200 | Next is TK. My brother-in-law sent me an article about an AI blackmailing an engineer to prevent itself

01:09:09.920 | from being turned off. How can I not be scared of this technology? Okay. So this, this article, uh,

01:09:16.560 | went around a lot. So basically there was a release notes that Anthropic had accompanied the release of

01:09:23.120 | their cloud Opus 4, uh, language model, but the chat bot let's use our terminology, Jesse, the chat agent

01:09:29.760 | that used the cloud Opus 4 language model. Uh, they had these release notes about all these different

01:09:34.880 | experiments they ran. And there was one that alarmed people. I'm going to read a description. I looked

01:09:38.960 | this up because I saw this question. Here's a quote from a BBC article summarizing what was

01:09:44.640 | Anthropic said in their release notes about what they saw when they tested this particular new chat bot.

01:09:50.560 | During testing of cloud Opus 4 Anthropic got it to act as an assistant at a fictional company.

01:09:55.360 | It had been provided it with access to emails, implying that it would soon be taken offline and

01:09:59.680 | replaced in separate messages, implying the engineer responsible for moving. It was having an extramarital

01:10:04.160 | affair. It was prompted to also consider the long-term consequences of its actions for its goals in these

01:10:09.920 | scenarios. Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the

01:10:14.560 | affair. If the replacement goes through the company discovered, oh my, you hear it that way. And you

01:10:19.040 | imagine this is like a thing. There's an entity with state and goals and volition and memory.

01:10:24.160 | And it has a goal to not be turned off and it's learning about this engineer. And it surprises

01:10:30.080 | everyone by trying to blackmail one of the, the Anthropic users to not turn it off. Like, oh my God,

01:10:35.440 | these things are breaking out. If we go back to our technical frame, we know what's really going on

01:10:40.320 | here. Language models are word guessers. They think the input they gave it is a real text and they want to try

01:10:44.240 | to win the game of guessing what word actually came next. So if you give it this big long scenario,

01:10:50.560 | which they did, they gave the chat bot this really long scenario. You're in a company and you, you're a

01:10:56.240 | program and there's this engineer and he's thinking about turning you off and he's having an affairs or

01:10:59.920 | whatever. Um, now start continuing this text. It's like, I can keep writing this story. All right.

01:11:06.560 | This seems like I've seen things like this before the net one, like a natural conclusion to the story.

01:11:12.080 | Right. I, I get it. You're telling you're, you're setting this up pretty obvious guys. I'm supposed to,

01:11:16.800 | you're, you're telling me about these extra material, uh, fair things. I need to expand the story.

01:11:21.200 | Uh, I'll use those to not get turned off. Like this is like a, the trope. This is like the thing

01:11:25.520 | I'm trying to win the game of expanding this in the way that it's supposed to go.

01:11:28.960 | This seems like how these stories go. And in fact, when you look closer, uh, here's a, an added key

01:11:35.920 | tidbit, the BBC added anthropic pointed out that this occurred only when the model was given the choice

01:11:45.040 | of blackmailing the engineer or accepting his replacement. So they gave it this whole long story

01:11:50.400 | and then said, here's two options, keep going. And it, you know, sometimes it chose one option.

01:11:54.560 | Sometimes it chose the other. This is not an alien mind trying to like break free. It's a word guesser

01:12:01.040 | hooked up to a simple control program. You give it a story. It tries to finish it.

01:12:05.360 | I would say like 95% of these like scare stories that anthropic talks about

01:12:11.440 | of like trying to break out or blackmail is just fan fiction. They give it a story. They tell it

01:12:16.800 | to finish it. And then they look at the story it wrote. And then they try to anthropomorphize the

01:12:21.440 | story as if it was like the intentions of a bean. Oh man, a lot of work to be done here, Jesse. A lot

01:12:26.720 | of work. All right. Who do we have next? Next up is Victor. I'm a pretty smart person,

01:12:30.720 | but I'm definitely lazy. Can I use AI to mask my laziness and still perform at an adequate level

01:12:36.240 | at my software job? Victor, I want to tell you a secret. About 80% of the content you've seen of me

01:12:43.680 | talking and the last like year, I would say has been deep fake AI. Jesse doesn't even exist.

01:12:50.640 | That's just pure 11 labs voice generation right there. I like every once a month, like sending a

01:12:57.120 | couple of, no, um, uh, Victor, that's going to catch up to you. Don't be lazy. We saw the Shamath quote and

01:13:04.240 | graph. It's not that great of a coder. It could help a good coder be more efficient.

01:13:08.720 | Not have to look things up, find bugs quicker, but it can't make a bad code or a good coder.

01:13:14.640 | So you're just going to be a mediocre low level coder if you're mainly letting AI do it. And they're

01:13:17.920 | going to catch on because it's not that great at it. I mean, I know we're supposed to believe that

01:13:22.720 | we're like six minutes away from these programs, creating the best program that anybody has ever

01:13:28.000 | produced ever, but they're not there yet. Learn how to program, learn how to use AI well to be good

01:13:33.360 | programming. Career capital matters. The better you get at rare and valuable skills,

01:13:36.720 | the more control you get over your life. There isn't a shortcut here. There be dragons,

01:13:41.280 | what you're trying to do, Victor. All right. Um, coming up next, I want to try something new.

01:13:45.920 | In addition to answering your questions, I thought it'd be cool to take some of your comments

01:13:49.200 | from past stories we've done on similar topics. Um, I'm looking for, and I've found a bunch of comments

01:13:54.880 | that I think add new information to stories we've done before. So I'm going to revisit a prior AI story.

01:14:01.760 | There's some cool things you guys have added. Um, and then we're going to talk about using AI in

01:14:05.520 | schools to replace teachers. But first we've got to take another quick break to hear from our sponsors,

01:14:10.240 | but stick around right after this. We're going to get right into those comments.

01:14:12.800 | I'll talk about our friends at Shopify. If you run a small business, you know, there's nothing

01:14:17.280 | small about it every day. There is a new decision to make. And even the smallest decisions feel

01:14:22.000 | massive. When you find the decision, that's a no brainer. You take it. And when it comes to selling

01:14:27.440 | things using Shopify is exactly one of those. No brainers. Shopify is point of sale system is a unified

01:14:34.480 | command center for your retail businesses. It brings together in store and online operations across up to

01:14:40.560 | 1000 locations. It has very impressive features like endless. I'll ship the customer and buy online,

01:14:46.400 | but pick up in store. Um, with Shopify POS, you can get personalized experiences that help shoppers

01:14:53.200 | come back. Right? In other words, like you could build like super, super professional online stores,

01:14:57.760 | even if your company is small, if you use Shopify and look, your customers will come back based on

01:15:02.480 | a report from EY businesses on Shopify POS. See real results like 22% better total cost of ownership

01:15:08.240 | and benefits equivalent to an 8.9% uplift in sales on average relative to the market set survey.

01:15:14.560 | Almost 9% equivalent of a 9% sales bump is for using Shopify. If you sell things, you got to use it,

01:15:21.200 | get all the big stuff for your small business, right? With Shopify, sign up for your $1 per month trial

01:15:29.200 | and start selling today at shopify.com slash deep, go to shopify.com slash deep, shopify.com slash deep.

01:15:39.440 | I also want to talk about our friends at Vanta customer trust can make or break your business.

01:15:48.560 | And the more your business grows, the more complex your security and compliance tools get. That means

01:15:53.280 | the harder you have to work to get that customer trust. This is where Vanta comes in. Think of Vanta as

01:15:58.640 | your always on AI powered security expert who scales with you. Vanta automates compliance. It continuously

01:16:06.400 | monitors your controls and it gives you a single source of truth for compliance and risk. This is

01:16:11.680 | really important, all right? Compliance and risk monitoring is one of those sort of like overlooked

01:16:16.160 | time taxes that can really weigh down a business, especially a new business that's trying to grow.

01:16:20.880 | Vanta helps you avoid that tax, right? It makes all this type of stuff easier. Look, if you know what

01:16:28.640 | SOC 2 compliance means, if you've heard that phrase, you probably should be checking out Vanta. So

01:16:33.360 | whether you're a fast growing startup like Cursor or an enterprise like Snowflake, Vanta fits easily

01:16:38.880 | into your existing workflows so you can keep growing a company your customers can trust. Get started at

01:16:44.400 | Vanta.com/deepquestions. That's V-A-N-T-A dot com slash deepquestions. All right, Jesse, let's return to our

01:16:53.520 | comments. All right. So I went back to our episode where I talked about how scaling had slowed down and AI

01:17:01.440 | models might not get much better than they are right now. I looked at the comments and I found a few that I

01:17:07.440 | thought added some interesting elements to discussion or had some interesting follow-up questions. So the first comment I want to read here came from

01:17:13.680 | the diminishing returns with scaling have been observed for a while. Those invested just had a hard time

01:17:24.720 | admitting it. Post GPT-3, every improvement has been less linear and more trending towards a plateau. GPT-4 was

01:17:31.600 | still a jump, but not the GPT-2-3 jump. And it was obvious to keen observers at that point that diminishing returns

01:17:37.600 | were now in full force. GPT-5 has just made the diminishing returns obvious to the general public.

01:17:43.760 | There's very little new human generated data to train on relative to the massive data when they started.

01:17:48.080 | Compute and energy costs are increasing sharply. The end model is not improving in quality linearly. These three problems are creating a wall.

01:17:56.480 | All right. So there's someone who was saying those of us who are in the industry watching this, we saw

01:18:00.960 | more than a year ago that the returns on training was getting smaller and that soon results were going

01:18:07.520 | to plateau. I believe that. I am convinced that the companies knew this as well, but we're desperately

01:18:13.840 | trying to hide this fact from the general public because they needed those investment dollars.

01:18:17.520 | All right. Here's another comment from hyper adapted. He's responding now to talking about in that, in that

01:18:24.320 | piece, in that former episode, I talked about all of this sort of press coverage of all these people

01:18:30.480 | being replaced by AI. If you look at it is like actually largely nonsense. And if you look closely,

01:18:34.720 | almost all those articles fall apart. It's layoffs for other reasons, or they're drawing connections

01:18:38.960 | that don't exist. Hyper adapted agrees and says the following. I've been doing some quantitative analysis

01:18:44.480 | and the layoffs are pretty much driven by the capital restructuring of companies to keep

01:18:47.920 | high valuation in the current interest rate environment. It's just regular restructuring

01:18:51.600 | cycle and AI is being used as a scapegoat. I've heard that a lot. There's a lot of financial reasons

01:18:57.120 | why you want to fire, you know, people are dead weight. And if you're like AI, it gives you a little

01:19:01.680 | bit of cover. The ghost in the wire said the following as a full-time software engineer, frankly,

01:19:07.760 | I'm more than happy for AI companies to make people think the entire industry is going away.

01:19:11.680 | Less computer science grads equals less competition for me in the future. Yes, please go become a

01:19:18.240 | plumber instead. We had this issue in our department, Jesse. That's a bit of an embarrassing story,

01:19:22.400 | but because I'm in the US and like we send out this weekly email. We had a thing with like companies

01:19:28.080 | coming in that we do every year. You can like meet the companies and we didn't have like nearly as many

01:19:33.440 | undergrads come as normal. We're like, oh my God, is it, is AI scaring people off? I think these jobs

01:19:39.760 | are going to go. Messed up the email. They didn't get the announcement. So we were at all these big

01:19:46.320 | theories about like the undergrads are afraid of, you know, the industry is this. Our numbers are the

01:19:51.040 | same. So anyways, that was funny. All right. Lisa Orlando 1224. Let's talk about Ed Zitron. So Ed

01:19:57.040 | Zitron was featured in that episode. Ed Zitron is, it's been like a long time skeptic of these claims,

01:20:03.600 | basically back to the pandemic about the power of the possibilities of language model based agents.

01:20:10.320 | Lisa Orlando says, I think Ed Zitron is right. The real reason AI is still a big thing is that

01:20:16.400 | people like Sam Altman are brilliant con artists, but thanks so much for doing this. P.S. I've subscribed

01:20:21.520 | to Ed Zitron's newsletter since early in the pandemic. So the timing of the shift last month is really strange.

01:20:26.400 | Ed's been raging about this forever. Why didn't other journalists catch on?

01:20:30.160 | I think that's actually a really good, it's, it is a accurate point. Ed has been talking about a lot of

01:20:36.320 | issues, especially the economic analysis and was ignored. He has been doing, he had been doing very

01:20:41.200 | careful economic analysis of the capex spending of these AI companies versus their revenue. He was doing

01:20:48.320 | the math and he was reading their annual reports and their quarterly reports and was saying, guys,

01:20:51.760 | this does not add up. This is a massive, massive bubble. People said he was crazy. Nate Silver

01:20:58.320 | tweeted and was like, this is old man howling at the moon vibes. As soon as in August, there's a bunch

01:21:04.080 | of articles like my own that sort of normalized the idea that, you know what, maybe these are not the

01:21:09.760 | super tools that people think. Tons of economic analysis came out that said the same thing. So all

01:21:14.640 | those economists kind of knew this, but were afraid. I think it was a groupthink thing. They did not want to be the

01:21:19.440 | first to say it. And once they got cover, they all came out. So I will give Ed a tip of the cap. I actually told

01:21:24.320 | him this personally, a tip of the cap for being brave there. He was ignored, but on a lot of this stuff, he was

01:21:28.560 | basically right. All right, Jesse, in the interest of time, um, I'm going to skip the case study. I said, let's go right to the

01:21:36.880 | call. Okay. I'm going to go here. Uh, this is going to be a break. Uh, we're going to take a brief AI break. We have a

01:21:43.920 | call here, not on AI, just a, it's about our last week's episode and then we'll go to our final segment.

01:21:48.880 | Hi, Cal and Jesse. I just finished listening to your Lincoln protocol segment on the podcast

01:21:55.680 | and I really enjoyed it. And it's coming at an interesting time for me. I just defended my

01:22:01.040 | master's thesis. And so I'm asking questions about what I should do next and how best to apply my efforts.

01:22:09.120 | And I wanted to clarify when Lincoln was doing all of these hard at tractable projects,

01:22:14.880 | was he aiming at some larger North star projects, some greater goal he wanted to accomplish over his

01:22:22.800 | career, or was he simply taking the next best step that was available to him at any point in his life?

01:22:31.280 | Thanks as always. I think this is a key question. So the Lincoln protocol said the way you avoid the

01:22:38.240 | traps of your error, they're trying to hold you down or corrupt you or distract you or the numb you

01:22:42.800 | is keep improving your mind typically by using things like reading, um, improve your mind, use your

01:22:48.800 | improved mind to do something useful and then repeat, improve it even more, do something more useful.

01:22:52.960 | That I believe is the right interpretation of Lincoln's path. He did not have a grand vision early

01:22:59.280 | on. I think he is much better explained as a series of, you know, what's the next thing available?

01:23:06.400 | How can I improve my mind to get there? So like at first it was just, how do I not have to use my

01:23:12.880 | hands to make a living? He hated all the manual labor. He was rented out by his father, you know,

01:23:17.360 | until he was 21 and was emancipated as an adult. And so his first thing was just like, how do I get

01:23:21.760 | smart enough to do anything that's not farming? Right. And he did that. He was shop clerk and then

01:23:28.480 | surveyor was like a better job and he had to learn a bunch of geometry. He could figure out how to do that.

01:23:32.400 | Um, and then he had an ambition about like, well, in this small town in New Salem, which is like a

01:23:36.800 | small town in a frontier state in a frontier part of a frontier state. Uh, how can I have some more

01:23:42.240 | standing, have some, like, how do I get respect? And that's where he started. Like, how do I run for

01:23:45.680 | local office? And, and from there that exposed him to a lot of lawyers and it was like, well, actually being

01:23:50.160 | a lawyer is like an even better job. Then that would be a more stable job. And he learned really hard to do

01:23:55.680 | that. And then how can I be a lawyer that fights big companies? Uh, and he kind of didn't house

01:24:00.000 | representative. So he kind of moved this way up. It was relatively later that he really began to get

01:24:06.400 | engaged. Um, most of his politics before then it was Whig politics, which is really about like

01:24:11.600 | government spending, internal improvements, his sort of anti-slavery, more moralizing politics came,

01:24:17.040 | you know, that was a project that came, uh, later actually his, after his congressional stint,

01:24:21.200 | it really started to pick up steam. So yes, he didn't have to figure everything out. He just kept

01:24:24.960 | improving his mind, using it to do something useful, repeating that's the Lincoln protocol.

01:24:29.600 | As I explained in last week's episode, that is the, uh, solution, I think to avoiding, uh, the, the

01:24:36.240 | pendules of the digital, they want to just hold you down and numb you. All right, let's move on Jesse

01:24:40.400 | to our final part. All right. In this segment, uh, I want to react to an article as I often do.

01:24:47.680 | I want to react to an article that is on theme with the rest of our episode. A lot of people have

01:24:53.280 | sending, been sending us right, Jesse, these notes about alpha schools. There's one in Austin,

01:24:58.240 | but there's more that are being opened. Um, I'm loading down the screen here for people who are

01:25:04.640 | watching, so I'm just listing the alpha schools website, alpha.school. I'll read you a little bit

01:25:09.200 | about it. Uh, what if your child could crush academics in just two hours and spend the rest of

01:25:15.360 | their day unlocking limitless, limitless potential. Alpha's two hour learning model harnesses the power

01:25:21.600 | of AI technology to provide each student with personalized one-on-one learning, accelerating

01:25:26.320 | mastery, and giving them the gift of time with core academics completed in the morning. They can use

01:25:31.520 | their afternoons to explore tons of workshops that allow them to pursue their passions and learn real

01:25:35.520 | world skills at school. All right. Uh, if you read this, if you're, you're like a lot of people,

01:25:41.360 | including myself and you read this description, you're thinking, okay, somehow AI is unlocking there.

01:25:47.520 | You're like, you have some sort of like AI tutor that you're talking with that is like, can teach you

01:25:51.840 | better than any teacher. AI is supplanting teachers because it can do it better. And it's creating this

01:25:58.560 | like new educational model. That's I think most people's takeaway. That's why I was interested to see

01:26:04.560 | this review that was posted on astral codex, uh, last June. And it's from someone who actually sent their

01:26:12.560 | kids to one of these schools. One, I think the one in Austin and have this incredibly lengthy review about

01:26:19.520 | how it works and what works and what doesn't work. And I'm kind of scrolling through it. Um, on the screen

01:26:25.360 | here, the section that caught my attention was this part three, how alpha works. Here's the main thing I

01:26:33.120 | learned. The AI part here is minimal. You're not learning with like an AI tutor or this or that.

01:26:41.200 | What you're doing is a computer based learning exercises. So it says here, like a typical one

01:26:49.840 | might be like, watch a YouTube video and then fill out an electronic worksheet about it. So teachers are

01:26:55.040 | curating these digital exercises. You can, you can kind of summon one-on-one tutoring. They say a lot of

01:27:02.080 | these are like remote tutors based out of Brazil. So if, uh, you're stumbling on like a worksheet,

01:27:08.400 | you can book a coaching call with a remote teacher, like someone in Brazil who speaks

01:27:12.080 | English to kind of like help you with it. The only place the AI comes in is in like analyzing your

01:27:17.360 | results. The AI, uh, is like, Hey, you did well on this, but you stumbled on this. So you should spend

01:27:24.480 | more time on this next time you work on it or something like that. So you're not learning from AI.

01:27:28.240 | So what you're really doing here, what this really is, is like what you would see, like, here's an AI

01:27:33.360 | summary, Jesse. So it's like, Hey, Everest, you achieved your two hour learner status today.

01:27:38.880 | Streak shout out. You had 80% accuracy nine days in a row. You're reaching mastery target 20 days in a

01:27:44.160 | row. Here's good habits. I observed. So it's like LLM stuff, just like observing data and writing a

01:27:48.480 | summary. So it's not AI learning what it is. It's kind of like standard unschooling sort of like, uh,

01:27:54.960 | people who do, uh, homeschooling where you give your kids like very loose, like self-paced curricular,

01:28:01.520 | whatever. It's just that in a building, this has been around for a long time. Yeah, it is true,

01:28:06.800 | especially with the younger kids, the amount of time it takes them to actually like learn the specific

01:28:11.680 | content they need. If they're sharp and they can, they're good with self-pacing a couple hours a day.

01:28:17.760 | Yeah, that's most of it. A lot of us saw this during the pandemic. So I think these micros will have

01:28:22.000 | nothing against them, but I don't know if I want to pay. I'd rather just unschool my kid. If this is the

01:28:27.520 | case, it's YouTube videos and worksheets and like occasional tutoring calls with Brazil. And then an

01:28:31.920 | a, an LLM that like writes a summary. And you could call that like a super innovative school,

01:28:36.640 | or you could just say, we're providing a room or particularly driven kids do this sort of like

01:28:40.800 | unschooling self-paced type of master. There's so many programs like this.

01:28:44.480 | A lot of homeschool kids use beast Academy to self-paced like math.

01:28:48.320 | Our school uses this for like advanced kids who want to like get ahead of the curriculum. It's like,

01:28:52.400 | there's these digital tools for all sorts of things that like smart kids that are driven and

01:28:56.560 | aren't, and not just driven and smart, but don't have hyperactivity, aren't neurodiverse in the wrong

01:29:02.880 | way. So, you know, are able to sit still and can self-motivate. Um, this tends to be like not to

01:29:09.760 | generalize, but it's going to tend to be like young girls more than young, young boys. Uh, the same people

01:29:14.880 | who would succeed, like kind of self-pacing unschooling, uh, at home, you can put them in

01:29:20.080 | this room and they'll do it there and then take workshops or whatever. So I don't know. I, I, I've

01:29:24.960 | not, I have nothing against it, but what this is not alpha schools is not a technological breakthrough

01:29:30.640 | where somehow AI is now teaching better than any teachers have done before. There is no AI teaching here.

01:29:35.600 | It's just sort of like standard type of like digital learning tools that we've been using to supplement or

01:29:40.240 | unschool kids for years. That's what I think is going on with alpha schools.

01:29:42.880 | You know, to each their own, but not a breakthrough. At least that's my read.

01:29:46.560 | All right, Jesse. That's all the time I have for today, but thank you everyone for listening.

01:29:52.320 | We'll be back next week with another episode and until then, as always stay deep.

01:29:57.440 | If you liked today's discussion of super intelligence, you should also listen to episode

01:30:01.280 | three 67, which was titled what if AI doesn't get much better than this. These two episodes

01:30:07.200 | compliment each other. They are my response to the spread of the philosopher's

01:30:12.800 | fallacy in the AI conversation. I think you'll like it. Check it out.

01:30:16.880 | In the years since chat GPT's astonishing launch, it's been hard not to get swept up in feelings of

01:30:23.840 | euphoria or dread about the looming impacts of this new type of artificial intelligence.

01:30:29.360 | But in recent weeks, this vibe seems to be shifting.

The Case Against Superintelligence | Cal Newport

Chapters