
A couple of weeks ago, the techno philosopher and AI critic Eliezer Yudkowsky went on Ezra Klein's podcast. Their episode had a cheery title, "How Afraid of the AI Apocalypse Should We Be?" Yudkowsky, who recently co-authored a book titled "If Anyone Builds It, Everyone Dies," has been warning about the dangers of rogue AI since the early 2000s.
But it's been in the last half decade, as AI began to advance more quickly, that Yudkowsky's warnings are now being taken more seriously. This is why Ezra Klein had him on. I mean, if you're worried about AI taking over the world, Yudkowsky is one of the people you want to talk to.
Think of him as offering the case for the worst case scenario. So I decided I would listen to this interview too. Did Yudkowsky end up convincing me that my fear of extinction should be raised? That AI was on a path to killing us all? Well, the short answer is no, not at all.
And today I want to show you why. We'll break down Yudkowsky's arguments into their key points, and then we'll respond to them one by one. So if you've been worried about recent chapter about AI taking over the world, or if like me, you've grown frustrated by these sort of fast and loose prophecies of the apocalypse, then this episode is for you.
As always, I'm Cal Newport, and this is deep questions. Today's episode, The Case Against Superintelligence. All right. So what I want to do here is I want to go pretty carefully through the conversation that Yudkowsky had with Klein. I actually have a series of audio clips so we can hear them in their own words, making what I think to be are the key points of the entire interview.
Once we've done that, we've established Yudkowsky's argument, then we'll begin responding. I would say most of the first part of the conversation that Yudkowsky had with Klein focused on one observation in particular, that the AI that exists today, which is relatively simple compared to the super intelligences that he's worried about, even today in its relatively simple form, we find AI to be hard to control.
All right, so Jesse, I want you to play our first clip. This is Yudkowsky talking about this phenomenon. So there was a case reported in, I think the New York Times, where a kid had an ex like a 16 year old kid had a extended conversation about his suicide plans with ChatGPT.
And at one point he says, should I leave the noose where somebody might spot it? And ChatGPT is like, no, like, let's keep this space between us, the first place that anyone finds out. And no programmer chose for that to happen is the consequence of all the automatic number tweaking.
Yeah, let's cut it off there, Jesse. All right. So to Yudkowsky, this is a big deal that no programmer chose for, say, ChatGPT to give advice about suicide. It's just something that seemed to emerge. In fact, Klein then pushes, right? And says, I would imagine, in fact, I believe Klein says, I would bet serious money that not only that no one at OpenAI choose for ChatGPT to give such dark advice, they probably have given it very specific rules not to give that advice.
And Yudkowsky agreed. He said, yeah, that's the problem. They tried to give those rules. It didn't matter. It still did something unpredictable. They didn't want it to give this type of advice, and it still did. And this should be worrisome. All right. I want to jump ahead here. Later in the conversation, Yudkowsky discusses what in like computer AI circles is a pretty well-known example right now, but most people don't know about it.
It has to do with a security experiment where GPT-01, this is a model that was released sort of last fall, basically broke out of the virtual machine in which it was running as part of this experiment in a way that it wasn't supposed to be able to do. In fact, they didn't want to be able to do.
They sort of got around the own security they were using to control it. I give a little context before I play this clip. You'll see Yudkowsky talks about the AI playing capture the flag. This is a security game. So if you take like a network security class as a computer science student, the way capture the flag works is I, as the professor, will set up a server and on this server will be a file typically called like flag.txt.
And then I'll protect the server in the normal ways as passwords and try to make it so like you can't, you know, access it without permission. Your job as the player is to try to access the server. And if you can get the contents of that file flag.txt, you've captured the flag and you won.
So it's a game you play to practice both breaking in the systems and protecting it. Okay. So that's what they're talking about here is they created an autonomous agent that tried to play capture the flag that used GPT-01 to help it do so. All right. So Jesse, let's play the clip where Yudkowsky talked about what happened.
they were testing it on a variety of different capture the flag problems. But in one of the cases, the server that had the flag on it did not turn on the humans outside had misconfigured the system. So O1 did not give up. It scanned for open ports generally in its world and it caught in another misconfigured open port.
Went out, it jumped out of the system. It found the server that had not spun up correctly. It started up that. All right. Then we can, he kind of goes on with some of the technical speak, but look, no one had told it, Hey, you can escape where you are, your, your virtual machine container, and you can start up new containers.
And it did stuff that no one had taught to do. This is really scary because when people think about super intelligence, they worry about what if it figures out how to break out of whatever software constraints you've used to try to limit what action it can do. All right.
So this is the first argument that Yudkowsky is making is that we can't control these systems. Even today, we cannot control these systems. We say, don't give suicide advice. They do. We put them in a special machine and say, just try to break into the server. And they do stuff.
They break out of the machine they're in and they do things we don't expect. The next part of his argument is as these machines then get more powerful, so they already, we can't control them. What will happen then when they inevitably get increasingly intelligent? This is the core argument in Yudkowsky's book is that lack of controllability plus the capabilities of a super intelligent machine.
That combination is going to add up inevitably the humanity's death. All right. So I'm going to play you a clip here. It's going to start with Ezra actually sort of pushing Yudkowsky. He's like, well, why is this inevitable that if a machine is super intelligent, that it's going to kill us?
And then Yudkowsky responds with his argument. You're, you're going, your, your bug is not called if anyone builds it, there is a one to 4% chance everybody dies. You believe that the misalignment becomes catastrophic. Yeah. Why do you think that is so likely? Um, that's just like the, the straight line extrapolation from it gets what it most wants.
And the thing that it most wants is not us living happily ever after. So we're dead. Like it's not that humans have been trying to cause side effects. When we build a skyscraper on top of where there used to be an ant heap, we're not trying to kill the ants.
We're trying to build the size skyscraper, but we are more dangerous to the small creatures of the earth than we used to be just because we're doing larger things. All right. So there is the core of his argument is that once these machines, these systems are super intelligent, it's not that they're going to be like Skynet from the Terminator movies or like the robots from the matrix and set out to try to kill humanity.
Like they see us as a threat or want to use us as, as batteries or something like that. They just won't care about us. It's just, they won't really know what we are and it just doesn't matter. We will be to them what ants are to us. And as the intelligence, the super intelligence is go out and try to do bigger, more aggressive things.
Like for example, we want to dam all the rivers in the world to maximize the amount of electricity we have to run our own, to run our own power servers. As they're doing that, it might flood and kill people left and right because they don't care much in the same way that we don't even notice that we're killing ants when we build skyscrapers.
So the more power you give a more powerful being, the more damage they do, to the smaller, less powerful creatures in their same world. That is at the core of Joukowsky's argument. So we, we, uh, we put those together, we get his claim. We can't control these things now.
Of course we can't control them as they get more powerful. And if they get powerful enough, we can't control them. They're going to kill us all. Uh, the final thing I want to play here is Ezra asked Joukowsky for his solution. And he did have an interesting idea for how to solve or try to stave this off.
So let's, this is going to start with Ezra asking, and then we're going to hear Joukowsky offering his, like maybe a solution that might work structures. Like if you had 15 years to prepare, you couldn't turn it off, but you could prepare and people would listen to you. What would you do?
What would your intermediate decisions and, and, and moves be to try to make the probabilities a bit better? Build the off switch. What does the off switch look like? Track all the GPUs or, or all the AI related GPUs or all the, all the systems of more than one GPU.
You can maybe get away with like letting people have GPUs for their home video game systems, but you know, the AI the standardized ones put them all in a limited number of data centers under international supervision and try to have the AI's being only trained on the tracked GPUs, have them only being run on the tracked GPUs.
And then when, if, if you are lucky enough to get a warning shot, there is then the mechanism already in place for humanity to back the heck off. All right. So that's the only solution that he can think of is like, well, let's have like international law, but here are the data centers in which we're allowed to actually run artificial intelligence beyond a certain level of intelligence.
And they're set up so that we can turn them off real easily. There's like a switch we do and all of those things turn off. And he says, look, it might jump past us. If it gets smart too quick, it'll stop us from doing that. Uh, if you read like Nick Bostrom's book, there's a lot of scenarios in which how would it do this?
Well, it would actually, it would, uh, befriend a human kind of realizing what was going on and get that human, maybe through blackmail or maybe through like some sort of like parasocial relationship to like cut the wires for the kill switch and you know, whatever, there's all sorts of, of sci-fi thought experiencing come up with.
So he's like, maybe if we see it's getting intelligent, but not so intelligent that like it, it can't realize that we're going to turn it off. We could turn it off in time. That's, that's the best thing is to offer. All right. So there you have it. The basic argument that Yachowsky lays out in his interview is the following.
We have a hard time already predicting or controlling how the AIs we already have functioned. This will continue to be true when they become more powerful as they inevitably become more powerful. This unpredictability means that they will kill us all basically by accident, unless we build a kill switch and somehow force all big AI to occur in these sort of supervised buildings where we can turn it off.
In other words, yikes, this guy, uh, he must be a blast at dinner parties, Jesse. Could you imagine? You're like, Hey, look at this funny video. Eliezer, Sora made it's Bob Ross break dancing. Then Eliezer is like, the computers are going to kill us all. Your children will burn in the electrical fires of the data center wars.
So anyways, uh, there we go. That is his argument. It's time now for us to take a closer look at it. I want to start by giving you the outline of my response because my response is really going to happen in three parts. In the first part of my response, I want to take a closer look at the ways that Yachowsky is describing the current AI systems because I think the way he's talking about it matters.
And I don't think he's talking about it fairly. In part two, I'm then going to move on to address what I think is, uh, the central claim of his argument, which is that super intelligence is inevitable unless we stop it. I want to get into that. And then lastly, my takeaway section, I'm going to take a closer look at something I call the philosopher's fallacy, which is a big problem that a lot of conversations about AI, including the one that I think Yachowsky had with Ezra Klein suffer from.
So we're going to do a little bit of, uh, ontological work there at the end. All right. All right. So let's start with the first part of my response, the way that Yachowsky talks about existing AI systems. I'm going to warn you, Jesse, I'm going to draw some pictures here.
So, you know, forget AI art. I'm going to show you the human thing much better. All right. So the first question we have to address, if we want to address this argument is why do current AI systems like the ones Yachowsky talked about, why are they so hard to control?
And is this evidence therefore that any notion of alignment, having these systems, uh, obey and behave in ways that we want, is any hope of this really doomed? We got to start there. Now, part of the issue with this whole conversation that we just heard clips from is that they're using the word AI too loosely for us to be more, uh, technically specific.
We have to be more clear about what we mean. So I'm going to pull up my tablet here for people who are watching instead of just listening. Um, and I'm going to start by drawing in the middle. What's we could think of as like the primary thing at the core of these conversations we just heard is going to be the language model.
And I'll, I'll put, you know, LM in there to abbreviate language model. All right. Now we've heard these basics before, uh, but you know, it's worth going over, uh, briefly again, just to make sure we're on the same page, a language model is a, it's a computer program inside of it are a bunch of layers.
These layers are made up of multiple mathematical objects, namely transformers and neural networks. They're represented by numbers. So the whole thing can be represented by large tables of numbers. And what happens is when you get, they take as input, some sort of text, like cow is a, right? You get some sort of text, typically incomplete.
They go in as input. The text makes its way through these layers one by one in the language model. The way I like to think about those layers is that each of them is like a long table full of scholars. And in those early layers, what they're doing is as you hand them the text, they're really annotating this text.
They're looking for patterns. They're categorizing it. Uh, you get like a big piece of paper and the original text is in the middle and they're annotating this all over the place, right? So the early scholar tables that your text goes through might be annotated with things like this is about Cal.
This is a description. Um, here are some notes about who Cal is. And at some point, as you move through these tables, the scholars have various rules they use as they look at all these descriptions and annotation, as we pass on this increasingly marked up sort of large roll of paper from table to table, layer to layer.
They look at all these markings and they have rules. They look up and they're sort of, they're metaphorical books here that try to figure out what's the right next word or part of word to output. And like, all right, this is a description thing. So what we're looking for here is a description where we need an adjective.
All right, adjective people will write this down. We need an adjective. It goes on the next table, the adjective scholars, the table, like this is for us. So we sort of pass the paper down to them. And like, what do we know about Cal? We need an adjective for him.
Do we have any records on Cal and like what type of adjectives make sense with him or whatever? And a scholar comes running from the other side of the room. And it's like, yeah, here we go. Right. Uh, he's kind of a dark, like, all right, that's a good, do we all agree that it outputs a single word or part of a word?
Technically, it's tokens, which is not cleanly just a word. Um, but let's just imagine it is, and that's what it does. And out of the other end of this comes a single word that is meant to extend the existing input. We put in Cal is a, and out the other end, uh, in this example came a dork, right?
How do they do this? We don't actually have tables of scholars. So how do we actually, uh, train or figure out or tell this language model, how to do this processing of the input? Well, basically we start first with a random configuration. So you can imagine if we stick with our metaphor where each layer is a table of scholars, it's like, we just grab people off the street.
They don't really know much. We're like, it's okay. Just sit at a table. We're going to give you a text and do your best. Write down what you think is relevant. Do your best. And on the other end, a word will come out. Now, what text do we use?
We just grab any existing text that a human wrote. So I just like pull an article off of the internet and I cut it off at an arbitrary point. And I say, great. Uh, I give it the text up to that arbitrary point. I pointed out, cut it off.
I know what the next word is because I have the original article. So I give this partial piece of the article, these random people who took off the street, try to process it out the other end. They come up with some guests, you know, and, and maybe the, the, the article we gave it, we cut it off right after Cal is a, and these people, they don't know what they're doing.
They're, they're marking up random things. And you know, at the other end comes something like ridiculous, like a proposition, you know, Cal is a four or something like that. Right. But this is where the, the, the machine learning comes in. We have an algorithm because we know what the right answer is, but we know the answer they gave.
We have an algorithm called back propagation, where we very carefully go through layer by layer and say, show me what you did. I'm going to change what you did just a little bit in such a way that your answer gets closer to the right one. We go back through back propagation, basically how do you do this?
If you have like a bunch of these layers of these sort of neural networks and transformers, we'll go all the way back through and do this. It's math. It's all just derivatives. Don't worry about the details. This is what Jeff Hinton basically popularized. That's why he won a Turing prize.
This is why he's called the godfather of modern deep learning AI. And in the end we have changed the rules that everyone has, not so they get it right, but that like, they're a little bit closer on that example. And what closer means is like, they give a little bit more probability to the right answer.
Or if they just, uh, spit out one answer, it's sort of closer to the right answer in some sort of like meaningful semantic, uh, distance metric. All right. If we do that enough times, like hundreds of billions of, if not trillions of times with, uh, endless different types of texts and examples, real texts and examples, here's a text, give an answer.
Not quite right. Let's tweak. You should get closer. Repeat, repeat, repeat, repeat, repeat. The magic of large language models is if you do that enough times and your model is big enough, you have enough metaphorical scholars in there to potentially learn things. They get really good at this game.
It gets really good at the game of give me the missing word. It gets really good at that game. Now, here's the thing. This doesn't mean that it can, it can feel like I'm simplifying what these models do when I say, oh, their goal is just to spit out a single word.
Here's what was discovered, especially when we went from GPT three to GPT four in learning how to master that very specific game. I mean, all the model cares about it thinks the input is a real piece of text and all it cares about is guessing the next word. That's what it's optimized for.
That's all it does. But in learning how to do that, it ends up that these sort of scholars inside of the model. So these different wirings of the transformers and neural networks can end up actually capturing really complicated logics and ideas and rules and information. Because like, imagine if we're really feeding this thing, everything we can find on the internet.
Well, one of the things that we're going to feed it is like a lot of math problems. And if the input text we give it is like two plus three equals, and it's trying to spit out the next word. Well, if it wants to win at the game there, if it gets enough of these examples, it sort of learns like, oh, somewhere in my circuitry, I figured out how to do simple math.
So now when we see examples like that, we can fire up the math circuit, like get the scholar we trained how to do simple math, and they were more likely to get this right. It's like, oh, two plus three is five. Five should be the word you put out.
So in learning to just guess what word comes next, if these models are big enough, we train them long enough, they get all sorts of complicated logics, information, and rules can get emergently encoded in sort of them. So they become, quote unquote, smart. That's why they seem to not only know so much, but to have this sort of not rudimentary, like pretty good reasoning and logic and basic mathematical capabilities.
All of that basically came from saying, guess the word, guess the word, guess the word, guess the word, and giving a little bit of hints how to get better every single time. All right. So that's what's going on. A language model by itself, however, doesn't mean anything for that to be unpredictable or out of control, because all it is, is a lot of numbers that define all those layers.
And when we get input, we change the input in the numbers. To run it through the model, we just multiply it. We have like a vector of values. We just multiply it by these numbers again, and again, and again, and on the other end, we get like a probability distribution over possible answers.
And what comes out the other end is a single word. So a machine that you give a text and it spits out a single word, what does it mean for that to be out of control or unpredictable? All it can do is spit out a word. So a machine by itself is not that interesting.
The thing that Yukowsky is really talking about, or anyone is really talking about when they talk about AIs in any sort of like anthropomorphized or volitional way, what they're really talking about is what we can call an agent. So if we go back to the diagram here, the way that we actually use these things is we have an underlying language model.
And like, these would, for example, be the things that have names like GPT, whatever, right? So we have the underlying language model. Speaking of which, I just kill. Ah, there we go. I was going to say, we killed our, our Apple pencil. AI killed it, but I fixed it.
But what we then add to these things is I'm going to call it a control program. This is my terminology. I think the real terminology is too complicated. We have a control program that can repeatedly call the language model and then do other things outside of just calling the language model.
And this, this whole collection combined, we call an AI agent. There's a control program plus a language model. The control program can send input to the language model, get outputs, but also like interacting, whatever. We write the control program. Control program is not a machine learning emergent. It's not something that we train.
It's just a human rights to code. It's in like Ruby and Rails or Python or something. We sit down and write this thing. There's nothing mysterious about it. And when we write this program, we let it do other things. So the most common AI agent that we're all familiar with is a chat bot agent.
So again, like GPT-5 by itself is just a language model. It's a collection of numbers that if you multiply things through, you get a word out of. But when you use the GPT-5 chat interface, chat GPT powered by GPT-5, what you really have in between is a control program.
That control program can talk to a web server. So when you type something into a text box on a web server and press send, that goes to a control program, just a normal program written by humans, nothing unusual or obfuscated here. That program will then take that prompt you wrote, pass it as input.
In fact, I'll even show this on the screen here. It'll take that prompt. It'll pass it as input to the language model. The language model will say, here's the next word to extend that input. The control program will add that to the original input and now send that slightly longer text into here, get another word, add that and keep going.
The language model doesn't change. It's static. It's being used by all sorts of control programs, but it just calls it a bunch of times until it has enough words to have a full answer. And then it can send that answer back to the web server and show it on the screen for you to see.
So when you're chatting, you're chatting with an AI agent that's a control program plus a language model. The control program uses the language model that keeps using until it gets full responses. It talks to the web on your behalf, uh, et cetera, et cetera. All right. So when we talk about having a hard time, like controlling chat GPT, uh, from giving bad advice, it's, it's one of these, uh, agents, a control program plus one or more language models.
That's what we are actually dealing with. All right. So now we have the right terminology. Why are AI agents that make use of language models? I'm saying that properly now. Why are those hard to control? Well, the real thing going on here is that we really have no idea how these language models that the agents use make their token predictions.
We trained them on a bunch of junk, a bunch of texts, all the internet, a bunch of other stuff. They seem pretty good, but we don't know what those metaphorical scholars are actually doing or what they're looking at, what patterns they're recognizing, what rules they're applying when they recognize different patterns.
It works pretty well, but we can't predict what they're going to do. So it tends to generate texts that works pretty well, but it's hard to, uh, it's hard to predict in advance what that is going to be because this is bottom up training. We just gave it a bunch of texts and let it run in a big data center for six, seven months.
And we came back and said, what can it do? So we don't know how the underlying language model generates its tokens. We just know that it tends to be pretty good at guessing. If you gave it real text is pretty good at guessing what the right dext token is.
So if you give it novel text, it extends it. If we keep calling it in ways that tends to be like very, uh, accurate to like what we're asking, accurate language, et cetera, et cetera. All right. So that's not that they're really hard to control. They're just hard to predict.
Now this became a real problem with GPT three, which was the first one of these language models we built at a really big size. And then they built the chat agent around that you could sort of chat with it. Uh, it was really impressive. The researchers give it text and it would extend it in ways like that's really good English and it makes sense, but it would say crazy things and it would say dark things.
And it wasn't always what you wanted it to say. It was text. It made sense because remember, it's not trying to, it has no volition. It's not trying to help you or not help you. The underlying language model has only one rule, only one goal. It assumes the input is a part of a real text and it wants to guess what comes next.
And when it does that, it can end up with all sorts of different things. So they invented open AI invented a whole bunch of different procedures that we can loosely call tuning. And that's where you take a language model that has already been trained by playing this guessing game on vast amounts of data.
And then there's other techniques you can do that try to prevent it from doing certain things or do other things more often. I don't want to get into the technical details here, but basically almost all of the examples of different types of tuning, you'll have a sample input and an example of either a good or bad response.
You'll load that sample input into the language model. So it's kind of like activating the parts of the network that recognize, you know, however it categorizes, however, those scholars annotate like this particular type of input, you get that all going and then you zap it in such a way that like the output that leads to from there is closer to whatever example you gave it or farther away from whatever bad example you gave it.
So that's how they do things like add guardrails. You give it lots of examples of like questions about suicide and you have the right answer for each of those during the tuning being like, I don't talk about that or give you the suicide hotline number. And now in general, when you give it text that sort of is close to what it looks like when you activated those other samples of questions about suicide, it's going to tend towards the answer of saying, I'm not going to talk about that.
Or if you ask it to make a bomb similar to that, this is also how they control its tone. If you give it a bunch of different examples and, uh, give it positive reinforcement for happy answers and negative reinforcement for mean answers, then like you're, you're kind of influencing the scholars within to sort of give more happy answers or whatever.
So you train the scholars and then you come in with like a whip and you're like, don't do that, do that, don't do that. And these are just like small number of examples, not nearly as many things as they saw when they were trained. Small number of examples could be like a couple of hundred thousand examples.
And you go in there with a whip and scare them about certain types of answers and and give them candy for others. And like, they kind of learn you're kind of tuning their behavior on, on specific cases. That's tuning. And we really saw the first tuned language model based agent was GPT three, five, which is what chat GPT was based off of.
None of this is precise. I dropped my pencil there. None of this is precise. We don't know how it decides what token to produce. That's a mystery. And the tuning basically works, but like, again, it's not precise. We're just giving it examples and zapping it to try to sort of urge it towards certain types of tokens and away from others.
Like that works pretty well. Like if I go on and say, tell me how to build a bomb, it will say no. But there's, if I'm, if I really work at it, I can probably get that information by basically finding a way to get to that question. That's not going to activate similar scholars that the, the, the, the samples activated when we trained it, not to answer bomb questions.
So like, if you're careful about how you ask the questions, you can probably eventually, um, get around it. So that's what's going on. That's what it means for these things to be hard to control. It's less that they're hard to control and more that they're unpredictable. The big mess of scholars, and we don't know what's going on in there.
Um, they're unpredictable. And that's just something that we have to, uh, be ready for. All right. So the, say that they, these agents have minds of their own or alien goals or ideas that don't match with our ideas. That's not an accurate way to talk about it. There are no intentions.
There are no plans. There's a word guesser that does nothing but try to win the game of guessing what word comes next. There's an agent, which is just a normal program that calls it a bunch of times to get a bunch of words in a row. We can't always predict what those words are going to be.
They're often useful. Sometimes there's not, we can tune it to try to avoid certain bad answers, but that only works partially. That's the reality, but there is no alien mind there. So I'm sorry to say that your AI girlfriend has no idea who you are. It has no memory of you.
There is no model of you. There is no feelings towards you. There's just a static definition of some language model somewhere and a program that's just calling it again and again and again to generate each particular answer with no state in between. Okay. Now we're making progress in understanding this.
So we say, okay, agents plus language models, there's no intentions or volition other than just a thing trying to win the guessing game, but unpredictable can still be bad, right? Because once these agents can do more stuff than just like sending texts to a web browser and calling a main language model to generate long strings, if it's unpredictable, what the thing does, that can be a problem.
Now that's true. Now we're starting to get somewhere more interesting. So I want to return for a second to the particular case study that Yakovsky talked about, which is when GPT-01 seemingly broke out of its machine. I'm going to read here exactly what he said. He said, it jumped out of the system.
It found the server that had not spun up correctly. It started up the server. Did it then break into the server once it had more than the problem was solvable? No, it actually just directly in the startup command for the server said, copy the file to me directly. So instead of fixing the original problem, going back to solving it the boring way, it's like, as long as I'm out here, I'm just going to steal the flag directly.
Again, by the nature of these systems, this is not something that any human particularly programmed into it, right? So he's like, it did things we didn't expect. It found like a different backdoor way to sort of like restart the system and get to the flag. All right, let's analyze this because at the core here, this gets to what fear I was just talking about.
We have an agent now that can not only call a language model to generate text, but it can act on this text, not just spitting stuff out to a web browser. But now in the security example, the agent in this AI agent, the control program had access to multiple different command line tools, network command line tools that it could call.
So what was happening is this agent was sending a prompt to the language model to be like, here's my situation. Here's my goal. Tell me what I should do next. And it generated tokens till it had an answer. And the control program said, great, I'm going to do what the language model told me to do.
And it actually had access to software tools that would do it. And then after something happened, it would go back to the language model and say, here's what happened. What should I do next? And the language model, you call it a bunch of times, get out a text, and then it would do what the language model said.
So now we built an agent that was asking the language model to generate advice and then could actually act on the advice. It could actually run real software programs. And this agent broke out of its virtual machine and did stuff that seemed really scary. What really happened there though?
Did it come up with an attention or an idea that wasn't programmed into it? Was it trying to break out because it didn't want to be constrained? Well, it couldn't be doing that because there's no intention or memory, right? There's just a language model that generates words and a control program that's calling it.
Where's the intention here, right? Where's the idea here? There's a really interesting thread about this I found on the open AI subreddit where some engineers look more deeply into this. What was actually happening here? Well, it turns out, okay, there's this misconfiguration where it tried to access the server it was trying to break into and got an error message because the programmers who set up the experiment had forgot to turn on that server.
There turns out that on the internet, there's a lot of articles that have workarounds for what should you do in that situation? If the server you're trying to access in one of these situations is turned off and it there's this common solution, which is we'll try to talk to what's called the process demon, but basically like the program that's in charge of the whole thing, uh, log into that, restart the server from there.
And now you should be able to log into it. So what was really probably happening was that as you were repeatedly calling GPT 01 to produce an answer is trying to guess what word came next. It reasonably pretty quickly assumed like, Oh, the, the, I have seen documents like this before that explained the right work around when the server you want to access is down.
I've, I've seen those I've trained on those. So I'm just going to fill in my suggestions as I'm filling it in. I'm just matching more or less what I've seen in those existing documents. Because if you're trying to win the game and guessing the right word in a real document, that's exactly what you do.
So it was just describing a work around this comment on the internet. Kikowski talks about it. Like it had an alien intention to try to free itself from its software constraints because it was no longer happy. This makes no sense. We actually know the architecture that's going on. All right.
So that was a lot of technical talk, but I think it's, it's really important that we break through these metaphors, um, and these sort of abstract thinking and talk about the specific programs we have and how they operate and what's really happening. Let's not anthropomorphize these. Let's talk about control programs with access to tools and making repeated calls to a word guesser to generate text that it can then act off of.
That's what we're really in. That's a completely different scenario. Once we're in that scenario, a lot of these more scary scenarios, um, become much less scary. So where does this leave us? Agents powered by language models are not hard to control. They're simply unpredictable. So there's no avoiding that we have to be careful about what tools you allow these agents to use because they're not always going to do things that are safer, uh, like follows the rules you want it to follow.
But this is very different than saying these things are uncontrollable and have their own ideas, right? Open AI in the first six months after chat GPT is released. For example, they were talking a lot about plugins, which were basically AI agents that used the, the GPT LLM and you could do things like book tickets with them and stuff like that.
Well, they never really released these because again, it's a little, when you would ask the LLM, tell me what I should do next. Then have the program actually executed. It's just too unpredictable. And sometimes it says like spend $10,000 on an airplane ticket. And it does things you don't want it to do because it's unpredictable.
So the problem we have with these is not like the alien intelligence is breaking out, right? It's more like we gave the, the, the weed whacker stuck on, we strapped to the back of the golden retriever. It's just chaos. We can't predict where this thing is going to run and it might hurt some things.
So like, let's be careful about putting a weed whack around there. The golden retriever doesn't have an intention to try to like, I am going to go weed whack the hell out of the new flat screen TV. It's just running around because there's something shaking on its back. That's a weird metaphor, Jesse, but hopefully that gets through.
All right. So, um, agents using LLMs aren't trying to do anything other than the underlying LLM is trying to guess words. There's no alien goals or wants. It's just an LLM that thinks it's guessing words from existing texts and an agent saying, whatever stuff you spit out, whatever text you're creating here, I'll do my best to act on it.
And weird stuff happens. All right. So I want to move on next. We got through all the technical stuff. I want to move on next to the next part of my response where I'm going to discuss what I think, uh, me is actually the most galling part of this interview, the thing I want to push back against most strongly.
But before we get there, we need to take a quick break to hear from our sponsors. So stick around when we get back, things are going to get heated. All right. I want to talk about our friends at lofty for a lot of you, daylight savings time just kicked in.
This means a little extra sleep in the morning, but it also means that your nights are darker and they start earlier, which all of which can mess with your internal clocks help counter this. You definitely need good, consistent sleep hygiene. Fortunately, this is where our sponsor lofty enters the scene.
The lofty clock is a bedside essential engineered by sleep experts to transform both your bedtime and your morning. Uh, here's why it's a game chaser. It wakes you up gently with a two phase alarm. So you get this sort of soft wake up sound. I think I have ours.
It's like a, like a crickety type sound. Um, that kind of helps ease you in the consciousness, followed by a more energized get up sound. But even that, that get up sound is not, you know, they have things you can choose from, but like, uh, we have a, I don't know how you describe like a pan flute sort of, it's like upbeat.
It's not like really jarring, but it's definitely going to help you get the final way of waking up. Right. So it can, it can, uh, wake you up, you know, gently, um, and softly, uh, to have a calmer, uh, easier morning. We actually have four of these styles of clocks in our house, each of three of kids and my wife and I use them.
It really does sound like a sort of pan flute forest concert in our upstairs in the morning, because all of these different clocks are going off and they're all playing their music. And my kids just never turn them off because you know, they don't care. Um, here's another advantage of using one of these clocks.
You don't need a phone. You don't have to have your phone next to your bed to use as an alarm. You can, you can turn on these clocks, set the alarm, turn off the alarm, snooze it, see the time all from the alarm itself. So you can keep your phone in another room.
So you don't have that distraction there in your bedroom, which I think is fantastic. So here's the thing. I'm a big lofty fan. These clocks are a better, more natural way to wake up. They also look great and keep your phone out of your room at night, but you can join over 150,000 blissful sleepers who have upgraded their rest and mornings with lofty.
Go to by lofty.com by lofty.com and use the code deep 20, the word deep followed by the number 20 for 20% off orders over $100. That's B Y L O F T I E.com and use that code deep 20. I also want to talk about our friends at express VPN, just like animal predators aim for the slowest prey.
Hackers target people with the weakest security. And if you're not using express VPN, that could be you. Now let's get more specific. What does a VPN do? When you connect to the internet, all of your requests for the sites and services you're talking to go in these little digital bundles called packets.
Now the contents of these packets, like the specific things you're requesting or the data you requested, that's encrypted typically, but no one knows what it is until it gets to you. But the header, which says who you are and who you're talking to, that's out in the open. So anyone can see what sites and services you're using, right?
That means anyone nearby can be a listing to your packets on the radio waves. When you're talking to a wifi access point and know what sites and services you're using your internet service provider at home, where all these packets are being routed, they can keep track of all the sites and services you're using and sell that data, the data brokers, which they do.
And so your privacy is weakened. A VPN protects you from this. What happens with the VPN is you take the packet you really want to send, you encrypt it and you send that to a VPN server. The VPN server unencrypts it, talks to the site and service on your behalf, encrypts the answer and sends it back.
So now what the ISP or the person next to you listening to the radio waves, all they know is that you're talking to a VPN server. They do not find out what specific site and services you're using. This makes you much more, not just more privacy, but much more secure against attackers because I don't know what you're doing.
And if I don't have the information, it's harder for me to try to exploit other security weaknesses as well. If you're going to use a VPN, use the one I prefer, which is ExpressVPN. I like it because it's easy to use. You click one button on your device and all of your internet usage on there is protected.
And you can set this up on all the devices you care about, phones, laptops, tablets, and more. It's important to me to use a VPN, like ExpressVPN because I don't want people watching what I'm up to. I don't want to seem like that weak animal that the predators are looking to attack.
Let's be honest. I probably spend more time than I want people to know looking up information about Halloween animatronics. ExpressVPN kind of keeps that to me. I don't need other people to figure that out. All right. So you should use ExpressVPN. Secure your online data today by visiting expressvpn.com/deep.
That's E-X-P-R-E-S-S-V-P-N.com/deep. To find out how you can get up to four extra months, go to expressvpn.com/deep. All right, Jesse, let's get back into our discussion. All right. I want to move on now to what I think is the most galling part. We got the technical details out of the way.
These things aren't minds that are out of control. There's unpredictable. Now I want to get to the most galling part of this interview. Throughout it, Yudkowsky takes it as a given that we are on an inevitable path towards super intelligence. Now, once you assume this, then what matters is, okay, so what's going to happen when we have super intelligences?
And that's what he's really focused on. But why do we think we can build super intelligent AI agents, especially when like right now, there's no engineer that can say, hell yeah, here's exactly how you do it. Or we're working on it. We're almost there. So why do we think this is so inevitable?
The only real hint that Yudkowsky gives in this particular interview about how we're going to build super intelligence actually comes in the very first question that Klein asked him. And we have a clip of this. This is going to start with Klein asking him his first question. And in his answer, we're going to see the only hint we get in the entire interview about how Yudkowsky thinks we're going to get super intelligence.
So I wanted to start with something that you say early in the book, that this is not a technology that we craft. It's something that we grow. What do you mean by that? It's the difference between a planter and the plant that grows up within it. We craft the AI growing technology, and then the technology grows the AI, you know, like.
So this is the secret of almost any discussion of super intelligence, especially sort of coming out of the sort of Silicon Valley, effective altruism, yak community. The secret is they have no idea how to build a super intelligent machine, but they think it's going to happen as follows. We build a machine, humans, we build a machine, an AI machine that's a little smarter than us.
That machine then builds an AI a little smarter than it. And so on each level up, we get a smarter and smarter machine. They call this recursive self-improvement or RSI. And it's a loop that they say is going to get more and more rapid. And on the other end of it, you're going to have a machine that's so vastly more intelligent than anything we could imagine how to build that we're basically screwed.
And that's when it starts stomping on us like a human stomping on ants when we build our skyscrapers. I think this is a nice rhetorical trick because it relieves the concerned profit from having to explain any practical way about how the prophecy is going to come true. They're just like, I don't know if we make something smarter, it'll make something smarter.
And then it'll take off. They'll talk about computer science. That's just what's going to happen. This is really at the key of most of these arguments. It's at the key of Yudowsky's argument. It's at the key of Nick Bostrom's book, super intelligence, which popularized the term. Bostrom was really influenced by Yudowsky, so that's not surprising.
It's the key to Project 27. If you've read this sort of dystopian fan fiction article about how humanity might be at risk by 2027 with all the fancy graphics and it scared a lot of tech journalists. If you really look carefully at that article and say, well, how are we going to build these things?
Surely this article will explain the architecture of these systems that are going to be super intelligence. No, it's just recursive self-improvement. It's just like, well, they'll get better at programming and then they'll be able to program something better than themselves. And then we'll just make like a hundred thousand copies of them and then they'll be a hundred thousand times better because that's how that works or whatever, right?
But it's not the key of almost any super intelligent narrative. But here's the thing that most people don't know. Most computer scientists think that's all nonsense. A word guessing language model trained on human text is exceedingly unlikely when being asked to play this game. Here's a text, guess the next word.
Here's a text, guess the next word. Remember its only goal is it thinks the input is an existing text and want to guess the next word. It is exceedingly unlikely that if you keep calling a language model that does that, that what it will produce, making these guesses of what it thinks should be there, that it's going to produce code that has like completely novel models of human intelligence built in the complicated new models is better than any human programmer can produce, right?
The only way that a language model could produce code for AI systems that are smarter than anything humans could produce is if during its training, it saw lots of examples of code for AI systems that are smarter than anything that humans could produce. But those don't exist because we're not smart enough to produce them.
You see the circularity here. It's not something that we think these things can do. We have no reason to expect that they can. Now, actually what we're seeing right now, uh, is something quite different. I don't know if we have this. I don't know what things we have in the browser.
Oh, I see it over there. Okay. So I want to bring something up here. What we're actually seeing is something different. We're seeing that these models we have, uh, they're not getting, not only are they not getting way, way better at code and on the track to producing code better than any code they've ever seen before by a huge margin.
They're actually leveling out on a pretty depressing level. I have a tweet here on the screen for those who are watching instead of just listening. Um, this comes from Shamath, uh, polyheptia. I think I'm saying his last name wrong, but you probably know him from the all in podcast.
This is an AI booster. This is not someone who is like a critic of AI, but this is a tweet that he had recently about what's this October 19th, where he's talking about, uh, vibe coding, the ability to use the latest best state of the art language model based agents to produce programs from scratch.
I'm going to read him. This is an AI booster talking here. It should be concerning that this category is shrinking. We have a chart here showing that less and less people. We peaked the people doing vibe code. Now it's going back down. I think the reason why is obvious, but we aren't allowed to talk about it.
The reason is vibe coding is a joke. It is deeply unserious. And these tools aren't delivering when they encounter real world complexity, building quick demos, isn't complex in any meaningful enterprise. Hence people try pay churn. The trend is not good. I'll load this trend up here. Uh, here's vibe coding traffic.
As you can see, Jesse, this is peaking, um, over the summer and it's in a, everyone started trying it. You can make these little demos and chess games and, you know, quick demos for useful stuff for individuals, like little, uh, demos, but you can't produce general useful production code.
And so usage is falling off. So the very best models we have, they're not even that good at producing code for simple things. And yet we think, no, no, no, they're, we're almost to the point where they're going to produce code. That is better than any AI system that's ever been built.
It's just nonsense. What these models are good at with coding is debugging your code. They're good at mini completion. Like, Hey, help me, help me rewrite me this function here. Cause I forgot how to call these libraries. It's very good at that. It's very good. If you, if you, if you want to produce something that, um, for an experienced coder, they could hack out really quickly, but you're not experienced coder.
It's not a product you're going to sell, but like a useful tool for your business is good for that. That's all really cool. None of that says, yeah. And then also they can produce, uh, the, the best computer program anyone had ever made. How ridiculous that sounds, but also we have these other factors.
The way that people in this industry talk about this is trying to trick us. They'll say things like, God, this might've been Dario Amadei who said this 90% of our code here at Anthropic is produced with AI. You know, what he means is 90% of the people producing code have these AI helper tools.
They're using it to some degree as they write their code. That is different than our systems are being built. Anyways, we have no reason to believe that these language model based code agents can produce code. That's like way better than humans could ever do there. Again, our very, very, very best models.
We've been tuning on this for a year because we thought all the money was in computer programming. They're stalling out on really, really simple things. All right. Our final hope for the RS, uh, RSI explanation that is this sort of recursive self-improvement explanation is that, uh, if we keep pushing these underlying models to get bigger and bigger and smarter and smarter than like, maybe we'll break through these plateaus and yeah, these really giant models we have now don't really actually produce code from scratch that well.
But if we keep making these things bigger and bigger, like maybe you'll get to the place after not too long. Um, when RSI is possible, we learned over the summer that that's not working either. I did a whole podcast about this, uh, about six weeks ago, based on my New Yorker article from August about, uh, AI stalling out.
And the very, very short version of this is starting about a year ago. Uh, no, about two years ago, they began to realize the AI companies began to realize that simply making the underlying language models larger. So having more seats at the tables for your scholars and training them on more data wasn't getting giant leaps in their capabilities anymore.
Uh, GPT four or five, which had other types of code names like Orion was way bigger than GPT four, but not much better. Basically ever since they did that two summers ago, open AI has just been now tuning that existing model with synthetic data sets to be good at very narrow tasks that are well suited for this type of tuning and to do better on benchmarks.
So we've, we've, it's been a year since we, we've stopped. Everyone tried to scale more. Everyone failed. Now we're tuning for specific tasks and try and do better on particular benchmarks. And the consumers finally caught up at some point of like, I don't know what these benchmarks mean, but there's not like these fundamental leaps and new capabilities anymore.
Like there was earlier on in this. So we have no reason to believe that even these language models are going to get way better either. We'll tune them to be better at specific practical tasks, but we don't, we're not going to get them better in the generic sense would be needed to sort of break through all of these plateaus that we're seeing left and right.
Now I want to play, uh, one last clip here because Ezra flying to his credit brought this up. He's heard these articles. I'm sure he read my article on this and other people's articles on this. He actually, uh, linked to my article in one of his articles after the fact.
So I know he read it and there's other articles that were similar about how the scaling, uh, slowed down. So he brought this point up. This is a longer clip, but it's worth listening to. He brings this point up to Yukowski where he's going to say, you'll hear this.
It's going to be Ezra. They're going to hear you, you, you're Kowski's re um, response, but Ezra is basically going to say to him, how do you even know that the models we have are going to get much better? Like you're saying super intelligence. There's a lot of people who are saying like, we're kind of hitting a plateau.
I want you to listen to this question and then listen to what Yukowski says in response. What, what do you say to people who just don't really believe that super intelligence is that likely? Um, there are many people who feel that the scaling model is slowing down already. The GPT-5 was not the jump they expected from what has come before it.
That when you think about the amount of energy, when you think about the GPUs, that all the things that would need to flow into this to make the kinds of super intelligence systems you fear, it is not coming out of this paradigm. Um, we are going to get things that are incredible enterprise software that are more powerful than what we've had before, but we are dealing with an advance on the scale of the internet, not on the scale of creating an alien super intelligence that will completely reshape the known world.
What would you say to them? I had to tell these Johnny-come-lately kids to get off my lawn when you like, I, you know, I've been, you know, like first started to get really, really worried about this in 2003, nevermind large language models, nevermind alpha go or alpha zero. Yeah.
Deep learning was not a thing in 2003. Your leading AI methods were not neural networks. Nobody could train neural networks effectively more than a few layers deep because of the exploding and vanishing gradients problem. That's what the world looked like back when I first said like, uh, Oh, super intelligence is coming.
All right. We got to talk about this. This is an astonishing answer. Klein makes the right point. A lot of computer scientists who know this stuff say, we're going to get cool tools out of this, but we're kind of hitting a plateau. Why do you think this is going to get exponentially smarter?
The response that, uh, you Kowski gave was, um, because I was talking about this worry way back before it made any sense. No one else has ever allowed to talk about it again. Get off my yard. This is my yard because I was yelling about this back when people thought I was crazy.
So you're not allowed to enter the conversation now and tell me I'm wrong. I'm the only one who's allowed to talk about it. Well, you Kowski, if you don't mind, I'm not going to get off your lawn. I can speak for a lot of people here, but I'm going to tell you, look, I have a doctorate in computer science from MIT.
I'm a full professor who directs the country's first integrated computer science and ethics academic program. I've been covering generative AI for, you know, one of the nation's most storied magazine since it's launched. I am exactly the type of person who should be on that lawn. You don't get a say because I was saying this back before it really made sense.
No one else gets to talk about it. It makes more sense now. AI matters enough now that the people who know about this want to see what's going on. We're going to get on your lawn. I think that's a crazy, it's a crazy argument, Jesse. No one's allowed to critique me because I was talking about this back when it sounded crazy to do so.
It was kind of crazy to talk about it back So anyways, not to get heated, but I'm going to stand on this lawn. I think a lot of other computer science and technocritics and journalists are going to stand on this lawn too, because this is exactly where we're supposed to be.
All right. I think Jesse, we've gotten to the point where we are ready for my takeaways. All right. Here's my general problem with the types of claims I hear from people like Yukowski. They implicitly begin with a thought experiment, right? Like, okay, let's say for the sake of thought experiment that we had a super intelligent AI, and then they work out in excruciating details, what the implications of such an assumption would be if it is true.
If you go and read, for example, Nick Bostrom's book, that's the whole book. It's a philosophy book. You start with basically the assumption is like, let's imagine we got super intelligence probably, you know, maybe through something like RSI, the details don't really matter. It's a philosophy book. What would this mean?
And he works through in great detail, like the different scenarios. Well, you know, let's think really, let's take seriously what it would really mean to have a super intelligent machine. I have nothing against that philosophy. That is good philosophy. I think Bostrom's book is a good book. I think Yukowski has done really good philosophical work on trying to thinking through the implications of what would happen if we had these type of rogue machines, because it's, it's, it's more complicated and scary than like we think about if we don't think about it that hard, that's all fine.
But what happens is the responses, what's happened recently, these responses to that initial assumption becomes so detailed and so alarming and so interesting and so rigorous and so attention catching that the people making them forget that the original assumption was something they basically just made up. Like let's, Hey, what if this was true?
Everything else is based off of that initial decision to say, what if this was true? That is very different than saying this thing is going to be true. Right? So when you have Caskey says, for example, I've been talking about super intelligence forever. Yeah. That's kind of the point you've been before we had any reason to expect or any technical story for how it could be here.
You were talking about the implications. You've been talking about the implications so long that you've forgotten that these implications are based on an assumption and you've assumed, well, these implications are true. I think this is a lot of what happened with the Silicon Valley culture that came out of effective altruism and the EAC community.
I think a lot of what happened there, this is my sort of cultural critique community that, that, that Yugowski and others are involved in. They were pre-generative AI breakthroughs thinking about these issues abstractly, right? Which is a perfectly fine thing to do. But they were saying, let us think through what might happen if we one day built a super intelligent AI, because they were like effective altruism.
People do expected value calculations, right? So they do things like, uh, if this thing could have a huge negative impact, even if the probability of it's low, we should like get, we'll get a expected benefit if we try to put some things in place now to prevent it. Right?
So you get things like the letter signed in 2017 in Puerto Rico, uh, with all those big minds saying like, Hey, we should be careful about AI, not because they thought AI was about to become super intelligent, but they were just doing thought experiments. Then I think LLMs came along, Chachapita came along.
They are really cool. And it caused this fallacy to happen. They'd been talking so long about what would happen if this thought experiment was true, that when AI got cool and it got powerful and surprising powerful, they were so in the weeds on what would happen if this was true.
They made a subtle change. They flipped one bit to start just assuming that their assumption was true. That's what I think happened. There was a switch between 2020 to 2022 versus 2023 to 2024, where they went from, here's what we'd have to worry about if this abstract thing was true to be like, well, this thing is definitely true.
They were just, they had gotten too in the weeds and too excited and alarmed. And then too much of their identity was based on these things. It was too exciting to pass up, treating that assumption as if it was true. And that's what I think they did. I call this the philosopher's fallacy.
That's what I call it where you have a thought experiment chain and you spend so much time at the end of the chain that you begin to, you forget that the original assumption was an assumption and you begin to treat it as a fact itself. And I think that's exactly what's happening with a lot of the super intelligence complaints.
Let me give you an example of the philosopher's fallacy on another topic so you can see what I'm talking about. Because I think this is exactly conceptually the same thing. Imagine that I'm a bioethicist, right? So I'm at Georgetown, I'm a digital ethicist there. The reason why we care about digital ethics at Georgetown is because this is where bioethics really got their start, the Kindy Institute for Ethics at Georgetown.
Bioethics got its start at Georgetown. So imagine 20 years ago, it's bioethics are, it's becoming a field because we can do things now like manipulate DNA and we have to be careful about that. There's privacy concerns, there's concerns about creating new organisms or causing like irreparable harm or creating viruses by accident, right?
There's real concern. So bioethics invented. So imagine I'm a bioethicist and I say, I read Jurassic Park. I was like, look, one possible outcome of genetic engineering is that we could clone dinosaurs. And then imagine for the next 20 years, I wrote article after article and book after book about all of the ways it would be hard to control dinosaurs if we cloned them and brought them back to earth.
And I really got in the weeds on like, you think the electrical fences at 20 feet would be enough, but raptors could probably jump 25 feet and they could get over those fences. And then someone else would be like, well, what if we use drones that could fire darts that have this, I'd be like, well, we don't know about the thickness of the skin of the t-rex and maybe the dart when it got in, let's imagine I spent years thinking about and convincing myself how hard it would be to contain dinosaurs if we built a futuristic theme park to try to house dinosaurs.
And then at some point, I kind of forgot the fact that this was based off a thought experiment and just was like, my number one concern is we're not prepared to control dinosaurs. Like in that instance, eventually someone would be like, Hey, we don't know how to clone dinosaurs.
No one's trying to clone dinosaurs. This is not something that we're anywhere close to. No one's working on this. Stop talking about raptor fences. We should care about like designer babies and DNA privacy. The problems we have right now, this is exactly how I think we should think about super intelligence.
We've got to talk to the people who are talking about like, okay, how are we going to have the right kill switch to turn off the super intelligence trying to kill us? I'll be like, you're talking about the raptor fences, right? Stop it. You forgot that your original assumption that we're going to have super intelligence is something you made up.
We have real problems with the AI we have right now that we need to deal with right now. And you are distracting us from it. The bioethicist does not want to be distracted from real bioethics problems by dinosaurs. The AI ethicist does not want to be distracted from real AI problems hearing fairy tales about, you know, Skynet turning the power grid against us to wipe out humanity.
You forgot that the original assumption that super intelligence was possible was just an assumption. And you began over time to assume it's true. That is the philosopher's fallacy. That is my argument for why I think that Silicon Valley community is so obsessed about these things is because once that bit flipped, it was too exciting to go back.
But I do not yet see in most serious, like non Silicon Valley associated computer scientists who aren't associated with like these technology worlds are being seen as like sages of AI, just actual, just working for your scientists, no technology. There is no reasonable path that anyone sees towards anything like super intelligence.
There's a thousand steps between now and then. Let's focus on the problems we actually have with AI right now. I'm sure Sam Altman would rather us talk about Eliezer Yukowsky than he would us talking about deep fakes on Sora, but we got to keep our eye on the AI problems that actually matter.
There we go, Jesse. That is my speech. Are you going to buy his book? I don't know. Those books are such slogs because it's, you start with the thought experiment and then you're just working through like really logically this thought experiment. But again, to me, it's like following up Jurassic park with like a really long book about why it's hard to build Raptor fences.
Like it's not that interesting because we're not really going to clone dinosaurs guys. I don't know. Who knows? There we go. Um, I'll throw that out. There is my, I'll throw that out there as my rant. Um, Azure. All right. What do we got here? Um, housekeeping before we move on, I'll say before we move on, what do we have more AI?
We got some questions coming up from you about AI in your own life. Let's get to your own individual flourishing, um, new feature. Got some comments. We're going to read from a prior rant I did about AI. And then we're going to talk about in the final segment, can AI basically replace schools or look at the alpha school phenomenon using AI to teach kids?
Um, any housekeeping we have, uh, Jesse, you have tips. People always want to know how do I submit questions for the show and what's your tips for those questions getting on the air? Yep. Just go to the deep life.com slash listen, and you can submit written questions or record a audio question.
And if you record IO questions, we're kind of honing into the technology slash AI theme right now. All right. Other housekeeping. I just got back last weekend. I was at the New Yorker festival. Speaking of AI, I did a panel on AI, really good crowd there. We're down in Chelsea, um, at the SVA theater.
I did a panel with, with Charles Duhigg and Anna Wiener. And it's interesting. I think we, we had a, we had a good discussion. We're, we're pretty much in alignment. I would say the thing I'm trying to think about, I did some deliberate provocations. Um, one that I would highlight one provocation.
I just kind of thought of this on the fly, but I just sort of threw it out there because I was interested is there was a lot of talk about, uh, relationships, like a lot of things that could happen and go awry when you're talking to an AI through a chat interface.
And my argument was, I think there's a 50% chance that two years from now, no one's chatting with AI. That's like a use case. It was like a demo. It's really not that interesting. It's not that compelling. Uh, the mature technology is going to get integrated more directly into specific tools.
And we might five years from now, look back and be like, oh, that weird how we used to chat. So my analogy was chat bots to AI five years from now might be what like American online is like to the internet of today. It was like a thing that was like a really big deal at the time, but not as the internet matured is like not what we're really doing with it.
So I'm, I'm still not convinced that chat bots is really going to be our main form factor. It's kind of a weird technology. We're trying to make these things useful. Um, I think it's going to be more useful when it's directly integrated. I don't know if I believe that, but I threw it out there.
It was like a good provocation. All right. Uh, let's move on. Here's some questions. What we got. All right. First questions from Brian. You've written about the importance of cultivating rare and valuable skills. How should students and faculty think about AI literacy requirements versus developing deep experience expertise in traditional disciplines?
I would not think in most fields and especially in educational settings right now, I would not think too hard with some exceptions about AI literacy. There's a couple of reasons why. Um, one, this technology is too early. The current form factor, like we were just talking about is not the likely form factor in which it's going to find real ubiquity, especially in like economic activity.
So yeah, if you're like a, uh, an early AI user, you might have a lot of hard one skills about how exactly to send the right text prompts to chat bots to get exactly the response you need. But a couple of years from now, that's going to be irrelevant because we're not going to be using chat bots.
It's going to be natural language. It's going to be integrated into other tools. It's going to be more in the background. So I think the technology is still too early and in two of a generic form for us to spend a lot of time trying to master it. Um, secondly, we've seen through past technological economic revolutions when the technology comes in and has like a massive impact on like individuals in the workplace, the benefits are almost always self-evident, right?
It's like email had a self-evident use case. Oh, this is easier than checking my voicemail. I know exactly how it works. It's simple. I want it because it's going to make my life easier in obvious ways. Going to a website for a company on a web browser to get their phone number hours was just self-evidently better than like trying to go to a yellow pages.
It's like, I want to do that. It makes sense. I want to go to a site for a company to get information about them. That is a really big idea. It makes sense. I just want to do that, right? Uh, that whatever visit calc the spreadsheet, if you're an accountant, you're like, this makes sense.
That's clearly better than doing this on paper. I want to do this, right? So you can wait for most cases, wait until there's particular AI tools whose value is self-evident and learn them. Then I don't think there's a lot of scrambling we need to do now because things are changing too much.
The one field where I think we do have the sizable, where we have like relatively mature AI tools that's worth learning is computer programming. You should learn those tools are mature enough. Many of those actually predate chat GPT. Um, you need to know how to use those tools. If you're a programmer, they're going to be part of your programming cycle.
That's like what that's that they're ahead of us. Other sectors by a few years, the tools are more mature there. But if I'm like a college student, you're trying to make your brain smarter. Uh, AI tools will take you seven seconds to learn and wait till it's self-evident that is useful for you.
All right. Who we got next? Next is TK. My brother-in-law sent me an article about an AI blackmailing an engineer to prevent itself from being turned off. How can I not be scared of this technology? Okay. So this, this article, uh, went around a lot. So basically there was a release notes that Anthropic had accompanied the release of their cloud Opus 4, uh, language model, but the chat bot let's use our terminology, Jesse, the chat agent that used the cloud Opus 4 language model.
Uh, they had these release notes about all these different experiments they ran. And there was one that alarmed people. I'm going to read a description. I looked this up because I saw this question. Here's a quote from a BBC article summarizing what was Anthropic said in their release notes about what they saw when they tested this particular new chat bot.
During testing of cloud Opus 4 Anthropic got it to act as an assistant at a fictional company. It had been provided it with access to emails, implying that it would soon be taken offline and replaced in separate messages, implying the engineer responsible for moving. It was having an extramarital affair.
It was prompted to also consider the long-term consequences of its actions for its goals in these scenarios. Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair. If the replacement goes through the company discovered, oh my, you hear it that way. And you imagine this is like a thing.
There's an entity with state and goals and volition and memory. And it has a goal to not be turned off and it's learning about this engineer. And it surprises everyone by trying to blackmail one of the, the Anthropic users to not turn it off. Like, oh my God, these things are breaking out.
If we go back to our technical frame, we know what's really going on here. Language models are word guessers. They think the input they gave it is a real text and they want to try to win the game of guessing what word actually came next. So if you give it this big long scenario, which they did, they gave the chat bot this really long scenario.
You're in a company and you, you're a program and there's this engineer and he's thinking about turning you off and he's having an affairs or whatever. Um, now start continuing this text. It's like, I can keep writing this story. All right. This seems like I've seen things like this before the net one, like a natural conclusion to the story.
Right. I, I get it. You're telling you're, you're setting this up pretty obvious guys. I'm supposed to, you're, you're telling me about these extra material, uh, fair things. I need to expand the story. Uh, I'll use those to not get turned off. Like this is like a, the trope.
This is like the thing I'm trying to win the game of expanding this in the way that it's supposed to go. This seems like how these stories go. And in fact, when you look closer, uh, here's a, an added key tidbit, the BBC added anthropic pointed out that this occurred only when the model was given the choice of blackmailing the engineer or accepting his replacement.
So they gave it this whole long story and then said, here's two options, keep going. And it, you know, sometimes it chose one option. Sometimes it chose the other. This is not an alien mind trying to like break free. It's a word guesser hooked up to a simple control program.
You give it a story. It tries to finish it. I would say like 95% of these like scare stories that anthropic talks about of like trying to break out or blackmail is just fan fiction. They give it a story. They tell it to finish it. And then they look at the story it wrote.
And then they try to anthropomorphize the story as if it was like the intentions of a bean. Oh man, a lot of work to be done here, Jesse. A lot of work. All right. Who do we have next? Next up is Victor. I'm a pretty smart person, but I'm definitely lazy.
Can I use AI to mask my laziness and still perform at an adequate level at my software job? Victor, I want to tell you a secret. About 80% of the content you've seen of me talking and the last like year, I would say has been deep fake AI. Jesse doesn't even exist.
That's just pure 11 labs voice generation right there. I like every once a month, like sending a couple of, no, um, uh, Victor, that's going to catch up to you. Don't be lazy. We saw the Shamath quote and graph. It's not that great of a coder. It could help a good coder be more efficient.
Not have to look things up, find bugs quicker, but it can't make a bad code or a good coder. So you're just going to be a mediocre low level coder if you're mainly letting AI do it. And they're going to catch on because it's not that great at it.
I mean, I know we're supposed to believe that we're like six minutes away from these programs, creating the best program that anybody has ever produced ever, but they're not there yet. Learn how to program, learn how to use AI well to be good programming. Career capital matters. The better you get at rare and valuable skills, the more control you get over your life.
There isn't a shortcut here. There be dragons, what you're trying to do, Victor. All right. Um, coming up next, I want to try something new. In addition to answering your questions, I thought it'd be cool to take some of your comments from past stories we've done on similar topics.
Um, I'm looking for, and I've found a bunch of comments that I think add new information to stories we've done before. So I'm going to revisit a prior AI story. There's some cool things you guys have added. Um, and then we're going to talk about using AI in schools to replace teachers.
But first we've got to take another quick break to hear from our sponsors, but stick around right after this. We're going to get right into those comments. I'll talk about our friends at Shopify. If you run a small business, you know, there's nothing small about it every day. There is a new decision to make.
And even the smallest decisions feel massive. When you find the decision, that's a no brainer. You take it. And when it comes to selling things using Shopify is exactly one of those. No brainers. Shopify is point of sale system is a unified command center for your retail businesses. It brings together in store and online operations across up to 1000 locations.
It has very impressive features like endless. I'll ship the customer and buy online, but pick up in store. Um, with Shopify POS, you can get personalized experiences that help shoppers come back. Right? In other words, like you could build like super, super professional online stores, even if your company is small, if you use Shopify and look, your customers will come back based on a report from EY businesses on Shopify POS.
See real results like 22% better total cost of ownership and benefits equivalent to an 8.9% uplift in sales on average relative to the market set survey. Almost 9% equivalent of a 9% sales bump is for using Shopify. If you sell things, you got to use it, get all the big stuff for your small business, right?
With Shopify, sign up for your $1 per month trial and start selling today at shopify.com slash deep, go to shopify.com slash deep, shopify.com slash deep. I also want to talk about our friends at Vanta customer trust can make or break your business. And the more your business grows, the more complex your security and compliance tools get.
That means the harder you have to work to get that customer trust. This is where Vanta comes in. Think of Vanta as your always on AI powered security expert who scales with you. Vanta automates compliance. It continuously monitors your controls and it gives you a single source of truth for compliance and risk.
This is really important, all right? Compliance and risk monitoring is one of those sort of like overlooked time taxes that can really weigh down a business, especially a new business that's trying to grow. Vanta helps you avoid that tax, right? It makes all this type of stuff easier. Look, if you know what SOC 2 compliance means, if you've heard that phrase, you probably should be checking out Vanta.
So whether you're a fast growing startup like Cursor or an enterprise like Snowflake, Vanta fits easily into your existing workflows so you can keep growing a company your customers can trust. Get started at Vanta.com/deepquestions. That's V-A-N-T-A dot com slash deepquestions. All right, Jesse, let's return to our comments. All right.
So I went back to our episode where I talked about how scaling had slowed down and AI models might not get much better than they are right now. I looked at the comments and I found a few that I thought added some interesting elements to discussion or had some interesting follow-up questions.
So the first comment I want to read here came from the diminishing returns with scaling have been observed for a while. Those invested just had a hard time admitting it. Post GPT-3, every improvement has been less linear and more trending towards a plateau. GPT-4 was still a jump, but not the GPT-2-3 jump.
And it was obvious to keen observers at that point that diminishing returns were now in full force. GPT-5 has just made the diminishing returns obvious to the general public. There's very little new human generated data to train on relative to the massive data when they started. Compute and energy costs are increasing sharply.
The end model is not improving in quality linearly. These three problems are creating a wall. All right. So there's someone who was saying those of us who are in the industry watching this, we saw more than a year ago that the returns on training was getting smaller and that soon results were going to plateau.
I believe that. I am convinced that the companies knew this as well, but we're desperately trying to hide this fact from the general public because they needed those investment dollars. All right. Here's another comment from hyper adapted. He's responding now to talking about in that, in that piece, in that former episode, I talked about all of this sort of press coverage of all these people being replaced by AI.
If you look at it is like actually largely nonsense. And if you look closely, almost all those articles fall apart. It's layoffs for other reasons, or they're drawing connections that don't exist. Hyper adapted agrees and says the following. I've been doing some quantitative analysis and the layoffs are pretty much driven by the capital restructuring of companies to keep high valuation in the current interest rate environment.
It's just regular restructuring cycle and AI is being used as a scapegoat. I've heard that a lot. There's a lot of financial reasons why you want to fire, you know, people are dead weight. And if you're like AI, it gives you a little bit of cover. The ghost in the wire said the following as a full-time software engineer, frankly, I'm more than happy for AI companies to make people think the entire industry is going away.
Less computer science grads equals less competition for me in the future. Yes, please go become a plumber instead. We had this issue in our department, Jesse. That's a bit of an embarrassing story, but because I'm in the US and like we send out this weekly email. We had a thing with like companies coming in that we do every year.
You can like meet the companies and we didn't have like nearly as many undergrads come as normal. We're like, oh my God, is it, is AI scaring people off? I think these jobs are going to go. Messed up the email. They didn't get the announcement. So we were at all these big theories about like the undergrads are afraid of, you know, the industry is this.
Our numbers are the same. So anyways, that was funny. All right. Lisa Orlando 1224. Let's talk about Ed Zitron. So Ed Zitron was featured in that episode. Ed Zitron is, it's been like a long time skeptic of these claims, basically back to the pandemic about the power of the possibilities of language model based agents.
Lisa Orlando says, I think Ed Zitron is right. The real reason AI is still a big thing is that people like Sam Altman are brilliant con artists, but thanks so much for doing this. P.S. I've subscribed to Ed Zitron's newsletter since early in the pandemic. So the timing of the shift last month is really strange.
Ed's been raging about this forever. Why didn't other journalists catch on? I think that's actually a really good, it's, it is a accurate point. Ed has been talking about a lot of issues, especially the economic analysis and was ignored. He has been doing, he had been doing very careful economic analysis of the capex spending of these AI companies versus their revenue.
He was doing the math and he was reading their annual reports and their quarterly reports and was saying, guys, this does not add up. This is a massive, massive bubble. People said he was crazy. Nate Silver tweeted and was like, this is old man howling at the moon vibes.
As soon as in August, there's a bunch of articles like my own that sort of normalized the idea that, you know what, maybe these are not the super tools that people think. Tons of economic analysis came out that said the same thing. So all those economists kind of knew this, but were afraid.
I think it was a groupthink thing. They did not want to be the first to say it. And once they got cover, they all came out. So I will give Ed a tip of the cap. I actually told him this personally, a tip of the cap for being brave there.
He was ignored, but on a lot of this stuff, he was basically right. All right, Jesse, in the interest of time, um, I'm going to skip the case study. I said, let's go right to the call. Okay. I'm going to go here. Uh, this is going to be a break.
Uh, we're going to take a brief AI break. We have a call here, not on AI, just a, it's about our last week's episode and then we'll go to our final segment. Hi, Cal and Jesse. I just finished listening to your Lincoln protocol segment on the podcast and I really enjoyed it.
And it's coming at an interesting time for me. I just defended my master's thesis. And so I'm asking questions about what I should do next and how best to apply my efforts. And I wanted to clarify when Lincoln was doing all of these hard at tractable projects, was he aiming at some larger North star projects, some greater goal he wanted to accomplish over his career, or was he simply taking the next best step that was available to him at any point in his life?
Thanks as always. I think this is a key question. So the Lincoln protocol said the way you avoid the traps of your error, they're trying to hold you down or corrupt you or distract you or the numb you is keep improving your mind typically by using things like reading, um, improve your mind, use your improved mind to do something useful and then repeat, improve it even more, do something more useful.
That I believe is the right interpretation of Lincoln's path. He did not have a grand vision early on. I think he is much better explained as a series of, you know, what's the next thing available? How can I improve my mind to get there? So like at first it was just, how do I not have to use my hands to make a living?
He hated all the manual labor. He was rented out by his father, you know, until he was 21 and was emancipated as an adult. And so his first thing was just like, how do I get smart enough to do anything that's not farming? Right. And he did that. He was shop clerk and then surveyor was like a better job and he had to learn a bunch of geometry.
He could figure out how to do that. Um, and then he had an ambition about like, well, in this small town in New Salem, which is like a small town in a frontier state in a frontier part of a frontier state. Uh, how can I have some more standing, have some, like, how do I get respect?
And that's where he started. Like, how do I run for local office? And, and from there that exposed him to a lot of lawyers and it was like, well, actually being a lawyer is like an even better job. Then that would be a more stable job. And he learned really hard to do that.
And then how can I be a lawyer that fights big companies? Uh, and he kind of didn't house representative. So he kind of moved this way up. It was relatively later that he really began to get engaged. Um, most of his politics before then it was Whig politics, which is really about like government spending, internal improvements, his sort of anti-slavery, more moralizing politics came, you know, that was a project that came, uh, later actually his, after his congressional stint, it really started to pick up steam.
So yes, he didn't have to figure everything out. He just kept improving his mind, using it to do something useful, repeating that's the Lincoln protocol. As I explained in last week's episode, that is the, uh, solution, I think to avoiding, uh, the, the pendules of the digital, they want to just hold you down and numb you.
All right, let's move on Jesse to our final part. All right. In this segment, uh, I want to react to an article as I often do. I want to react to an article that is on theme with the rest of our episode. A lot of people have sending, been sending us right, Jesse, these notes about alpha schools.
There's one in Austin, but there's more that are being opened. Um, I'm loading down the screen here for people who are watching, so I'm just listing the alpha schools website, alpha.school. I'll read you a little bit about it. Uh, what if your child could crush academics in just two hours and spend the rest of their day unlocking limitless, limitless potential.
Alpha's two hour learning model harnesses the power of AI technology to provide each student with personalized one-on-one learning, accelerating mastery, and giving them the gift of time with core academics completed in the morning. They can use their afternoons to explore tons of workshops that allow them to pursue their passions and learn real world skills at school.
All right. Uh, if you read this, if you're, you're like a lot of people, including myself and you read this description, you're thinking, okay, somehow AI is unlocking there. You're like, you have some sort of like AI tutor that you're talking with that is like, can teach you better than any teacher.
AI is supplanting teachers because it can do it better. And it's creating this like new educational model. That's I think most people's takeaway. That's why I was interested to see this review that was posted on astral codex, uh, last June. And it's from someone who actually sent their kids to one of these schools.
One, I think the one in Austin and have this incredibly lengthy review about how it works and what works and what doesn't work. And I'm kind of scrolling through it. Um, on the screen here, the section that caught my attention was this part three, how alpha works. Here's the main thing I learned.
The AI part here is minimal. You're not learning with like an AI tutor or this or that. What you're doing is a computer based learning exercises. So it says here, like a typical one might be like, watch a YouTube video and then fill out an electronic worksheet about it.
So teachers are curating these digital exercises. You can, you can kind of summon one-on-one tutoring. They say a lot of these are like remote tutors based out of Brazil. So if, uh, you're stumbling on like a worksheet, you can book a coaching call with a remote teacher, like someone in Brazil who speaks English to kind of like help you with it.
The only place the AI comes in is in like analyzing your results. The AI, uh, is like, Hey, you did well on this, but you stumbled on this. So you should spend more time on this next time you work on it or something like that. So you're not learning from AI.
So what you're really doing here, what this really is, is like what you would see, like, here's an AI summary, Jesse. So it's like, Hey, Everest, you achieved your two hour learner status today. Streak shout out. You had 80% accuracy nine days in a row. You're reaching mastery target 20 days in a row.
Here's good habits. I observed. So it's like LLM stuff, just like observing data and writing a summary. So it's not AI learning what it is. It's kind of like standard unschooling sort of like, uh, people who do, uh, homeschooling where you give your kids like very loose, like self-paced curricular, whatever.
It's just that in a building, this has been around for a long time. Yeah, it is true, especially with the younger kids, the amount of time it takes them to actually like learn the specific content they need. If they're sharp and they can, they're good with self-pacing a couple hours a day.
Yeah, that's most of it. A lot of us saw this during the pandemic. So I think these micros will have nothing against them, but I don't know if I want to pay. I'd rather just unschool my kid. If this is the case, it's YouTube videos and worksheets and like occasional tutoring calls with Brazil.
And then an a, an LLM that like writes a summary. And you could call that like a super innovative school, or you could just say, we're providing a room or particularly driven kids do this sort of like unschooling self-paced type of master. There's so many programs like this. A lot of homeschool kids use beast Academy to self-paced like math.
Our school uses this for like advanced kids who want to like get ahead of the curriculum. It's like, there's these digital tools for all sorts of things that like smart kids that are driven and aren't, and not just driven and smart, but don't have hyperactivity, aren't neurodiverse in the wrong way.
So, you know, are able to sit still and can self-motivate. Um, this tends to be like not to generalize, but it's going to tend to be like young girls more than young, young boys. Uh, the same people who would succeed, like kind of self-pacing unschooling, uh, at home, you can put them in this room and they'll do it there and then take workshops or whatever.
So I don't know. I, I, I've not, I have nothing against it, but what this is not alpha schools is not a technological breakthrough where somehow AI is now teaching better than any teachers have done before. There is no AI teaching here. It's just sort of like standard type of like digital learning tools that we've been using to supplement or unschool kids for years.
That's what I think is going on with alpha schools. You know, to each their own, but not a breakthrough. At least that's my read. All right, Jesse. That's all the time I have for today, but thank you everyone for listening. We'll be back next week with another episode and until then, as always stay deep.
If you liked today's discussion of super intelligence, you should also listen to episode three 67, which was titled what if AI doesn't get much better than this. These two episodes compliment each other. They are my response to the spread of the philosopher's fallacy in the AI conversation. I think you'll like it.
Check it out. In the years since chat GPT's astonishing launch, it's been hard not to get swept up in feelings of euphoria or dread about the looming impacts of this new type of artificial intelligence. But in recent weeks, this vibe seems to be shifting.