Microsoft Promises a 'Whale' for GPT-5, Anthropic Delves Inside a Model’s Mind and Altman Stumbles

00:00:00.000 | While Microsoft spend billions on shipping a whale-sized GPT-5,

00:00:05.920 | OpenAI gets tossed about in a storm of its own creation.

00:00:10.860 | Meanwhile, Google revealed powerful new details

00:00:13.920 | about the Gemini models that many will have missed.

00:00:17.560 | And then it was just yesterday that Anthropic showed us

00:00:20.720 | how they are the closest to understanding

00:00:24.440 | what goes on at the very core of a large language model.

00:00:28.720 | But I want to start with Kevin Scott, the CTO of Microsoft,

00:00:33.120 | who said something which, if true,

00:00:35.760 | is the biggest news of the week and even the month.

00:00:39.640 | According to him, we are not even close

00:00:42.000 | to diminishing returns with the power of AI models.

00:00:45.760 | Since about 2012, that rate of increase in compute

00:00:50.440 | when applied to training has been increasing exponentially.

00:00:54.280 | And we are nowhere near the point

00:00:56.560 | of diminishing marginal returns on how powerful

00:00:59.440 | we can make AI models as we increase the scale of compute.

00:01:03.240 | As we'll see, Kevin Scott knows both the size

00:01:05.920 | and power of GPT-5, if that's what they call it.

00:01:09.440 | So these words have more weight than you might think.

00:01:12.280 | And while we're speaking of exponentials,

00:01:14.120 | AI models are undeniably becoming faster and cheaper.

00:01:18.720 | While we're off building bigger supercomputers

00:01:21.640 | to get the next big models out

00:01:23.680 | and to deliver more and more capability to you,

00:01:25.920 | like we're also grinding away on making

00:01:29.120 | the current generation of models much, much more efficient.

00:01:32.840 | So between the launch of GPT-4,

00:01:35.520 | which is not quite a year and a half ago now,

00:01:38.600 | it's 12 times cheaper to make a call to GPT-4.0

00:01:42.800 | than the original GPT-4 model.

00:01:46.160 | And it's also six times faster

00:01:47.960 | in terms of like time to first token response.

00:01:51.040 | - On this channel, admittedly, I am laser focused

00:01:53.800 | on the growing intelligence of models,

00:01:56.560 | but this massive drop in cost

00:01:59.400 | does have some pretty profound ramifications too.

00:02:02.200 | It is kind of an obvious point,

00:02:03.800 | but when we get the first generally intelligent AI model,

00:02:07.360 | we will soon get ubiquitous AI models.

00:02:10.720 | Unless it gets monopolized, artificial intelligence,

00:02:13.640 | if it carries on getting cheaper and cheaper,

00:02:16.080 | could become absolutely pervasive

00:02:18.640 | inside your toaster and security camera,

00:02:21.280 | not just your laptop.

00:02:22.440 | Anyway, I promised you a whale analogy and here it is.

00:02:26.120 | - There's this like really beautiful relationship right now

00:02:29.320 | between the sort of exponential progression of compute

00:02:31.800 | that we're applying to building the platform,

00:02:34.440 | to the capability and power of the platform that we get.

00:02:37.720 | And I just wanted to, you know, sort of without,

00:02:40.120 | without mentioning numbers, which is sort of hard to do,

00:02:44.880 | to give you all an idea of the scaling of these systems.

00:02:49.480 | So in 2020, we built our first AI supercomputer for open AI.

00:02:54.480 | It's the supercomputing environment that trained GBD3.

00:02:59.080 | And so like, we're gonna just choose marine wildlife

00:03:03.240 | as our scale marker.

00:03:04.880 | So you can think of that system about as big as a shark.

00:03:09.840 | So the next system that we built,

00:03:13.400 | scale-wise is about as big as an orca.

00:03:16.960 | And like, that is the system that we delivered in 2022

00:03:20.880 | that trains GPT4.

00:03:23.000 | The system that we have just deployed is like scale-wise,

00:03:28.640 | about as big as a whale relative to like, you know,

00:03:32.120 | the shark-sized supercomputer

00:03:33.800 | and this orca-sized supercomputer.

00:03:35.680 | And it turns out like you can build a whole hell of a lot

00:03:37.880 | of AI with a whale-sized supercomputer.

00:03:40.200 | Just want everybody to really, really be thinking clearly

00:03:43.360 | about, and like, this is gonna be our segue

00:03:46.280 | to talking with Sam, is the next sample is coming.

00:03:49.800 | So like, this whale-sized supercomputer

00:03:52.120 | is hard at work right now,

00:03:53.920 | building the next set of capabilities

00:03:55.880 | that we're going to put into your hands

00:03:58.280 | so that you all can do the next round

00:04:00.760 | of amazing things with it.

00:04:02.360 | - As for the actual release date

00:04:03.800 | of this mysterious whale-sized model,

00:04:06.400 | Sam Altman would give no hint and Kevin Scott

00:04:09.480 | just described it as being within K months.

00:04:12.760 | On a quick side note, when one commenter said

00:04:15.200 | that GPT-4.0, as good as it is,

00:04:17.400 | shows that OpenAI simply don't know

00:04:19.760 | how to produce further capability advances.

00:04:22.120 | They can't do exponential improvements

00:04:24.360 | and they don't have GPT-5 even after 14 months of trying.

00:04:28.400 | The response from the head of Frontiers Research at OpenAI

00:04:32.560 | was, "Remind me in six months."

00:04:35.360 | I'm gonna leave OpenAI for a moment

00:04:37.360 | because I want to focus this video on Google and Anthropic

00:04:41.400 | who have both shipped very interesting developments.

00:04:45.240 | And first I want to focus on Google

00:04:47.160 | because I really feel like they buried the lead

00:04:49.920 | at the recent Google I/O event.

00:04:52.160 | They made, by my count, 123 mentions of AI,

00:04:55.920 | but didn't detail the improvements

00:04:58.000 | to their impressive Gemini 1.5 Pro.

00:05:01.000 | And they barely mentioned Gemini 1.5 Flash,

00:05:04.080 | which was trained in part

00:05:05.960 | by imitating the output of Gemini 1.5 Pro.

00:05:09.640 | The weird thing for me is that I had already read

00:05:11.720 | the 100 plus page Gemini report and done a video on it,

00:05:15.480 | but this refreshed report was so interesting,

00:05:19.120 | I counted a dozen new insights.

00:05:21.640 | I'm only gonna talk about around five today,

00:05:23.760 | otherwise this video would be way too long,

00:05:26.320 | but I will be coming back to this paper.

00:05:28.560 | The first thing to know is that you can already play about

00:05:31.360 | with these models in the Google AI Studio.

00:05:34.200 | Both Gemini 1.5 Pro accept video input, image input,

00:05:38.560 | text input, up to, for now, 1 million tokens.

00:05:42.120 | That's way more than GPT 4.0.

00:05:44.520 | Admittedly, Gemini 1.5 Pro does not have the RIS of GPT 4.0,

00:05:49.520 | but there are prizes for making impactful apps with it.

00:05:53.560 | Back to the highlight of the paper though,

00:05:56.080 | and page 43 I found really interesting.

00:06:00.360 | If you've been following the channel for a while,

00:06:02.600 | you'd know that adaptive compute,

00:06:05.320 | or essentially letting the models think for longer,

00:06:08.080 | is a very promising direction

00:06:10.640 | in advancing the intelligence of models.

00:06:12.960 | Well, this update to a paper was the first time

00:06:16.200 | I saw it in action with a current

00:06:19.000 | state-of-the-art large language model.

00:06:20.920 | Google wanted to understand how far they could push

00:06:25.280 | the quantitative reasoning capabilities

00:06:27.600 | of large language models,

00:06:28.960 | and they describe how mathematicians often benefit

00:06:31.900 | from extended periods of thought or contemplation

00:06:35.720 | while formulating solutions.

00:06:37.580 | And critically, they aim to emulate this

00:06:41.920 | by training a math-specialized model

00:06:44.680 | and providing it additional inference time computation,

00:06:48.160 | allowing it, they say,

00:06:49.000 | to explore a wider range of possibilities.

00:06:51.600 | If you want more background, do check out my Q* video,

00:06:54.320 | but if this general approach works,

00:06:56.440 | it means you could potentially squeeze out

00:06:58.920 | orders of magnitude more intelligence

00:07:01.200 | from the same size of model.

00:07:03.040 | Remember too that any improvements during inference,

00:07:06.000 | when the model is actually outputting tokens,

00:07:08.420 | would be complimentary to,

00:07:10.140 | in addition to improvements derived from scale,

00:07:13.620 | aka growing the models into giant whales.

00:07:16.660 | So what were the results?

00:07:18.460 | Well, we got a new record score

00:07:20.820 | on the math benchmark of 91.1%.

00:07:24.540 | So impressive was that to many

00:07:26.660 | that the CEO of Google, Sundar Pichai, tweeted it out.

00:07:30.300 | With that particular result though,

00:07:32.620 | there is a slight asterisk

00:07:34.740 | because the benchmark itself,

00:07:36.900 | surprise, surprise, has some issues.

00:07:39.260 | If you want to know more about those issues

00:07:41.460 | and my first glimpse of optimism for benchmarks,

00:07:45.180 | do check out the AI Insiders tier on Patreon.

00:07:48.460 | Making that video was almost cathartic to me

00:07:50.860 | because by the end, for the first time,

00:07:52.980 | I actually had hope

00:07:54.180 | that we could benchmark models properly.

00:07:56.700 | And while you're on Insiders,

00:07:58.340 | if you use AI agents at all in enterprise

00:08:01.980 | or are thinking of doing so,

00:08:03.740 | do check out our AI Insider resident expert,

00:08:06.900 | Donato Capitella, on prompt injections

00:08:09.900 | in the AI agent era.

00:08:11.820 | The effect of that extra thinking time though,

00:08:14.420 | was pretty dramatic for other benchmarks too,

00:08:17.500 | especially if you compare the performance

00:08:19.820 | of this math specialized 1.5 Pro to, say, CLAW 3 Opus.

00:08:24.580 | Of course, I wish the paper gave more details,

00:08:27.260 | but they do say the increased performance

00:08:29.580 | was achieved without code execution,

00:08:31.700 | clear improving libraries, Google search or other tools.

00:08:34.700 | Moreover, the performance is on par

00:08:36.620 | with a human expert performance.

00:08:39.060 | Very quickly, before I move on from benchmarks,

00:08:41.740 | it would be somewhat remiss of me

00:08:43.780 | if I didn't point out the new record in the MMLU.

00:08:47.380 | Now, yes, it used extra sampling

00:08:49.420 | and the benchmark is somewhat broken,

00:08:51.860 | but in previous months,

00:08:53.420 | a score of 91.7% would have made headlines.

00:08:57.420 | It must be said that for most of the other benchmarks though,

00:09:00.500 | GPC 4.0 beats out Gemini 1.5 Pro.

00:09:04.340 | Now, I know this table is a little bit confusing,

00:09:07.140 | but it means that the middle-sized model of today, 1.5 Pro,

00:09:11.180 | we don't have 1.5 Ultra,

00:09:12.860 | but the middle-sized model, 1.5 Pro,

00:09:15.260 | the new version, the May version,

00:09:17.100 | beats the original large version, 1.0 Ultra, handily.

00:09:21.140 | Not for audio, randomly,

00:09:22.900 | but for core capabilities, it's not even close.

00:09:25.900 | And the comparison gets even more dramatic

00:09:28.460 | when you look at the performance of Gemini 1.5 Flash,

00:09:31.980 | which is their super quick, super cheap model

00:09:34.620 | compared to the original GPT-4 size compute, 1.0 Ultra.

00:09:39.180 | Let's not ignore, by the way,

00:09:40.140 | that they can handle up to 10 million tokens.

00:09:42.460 | That's just a side note.

00:09:43.580 | Gemini Flash, by the way,

00:09:44.460 | is something like 35 cents for a million tokens.

00:09:47.420 | And I think by price alone, that will unlock new use cases.

00:09:51.380 | And speaking of use cases,

00:09:53.420 | the paper did something quite interesting

00:09:56.540 | and almost controversial that I haven't seen before.

00:09:59.300 | Within the model technical report itself,

00:10:02.060 | they laid out the kind of impact they expect

00:10:05.100 | across a range of industries.

00:10:07.460 | Now, while the whole numbers go up phenomenon

00:10:10.020 | is certainly impressive,

00:10:11.700 | when you dig into the details,

00:10:13.420 | it gets a little bit more murky.

00:10:15.380 | Take photography when they describe a 73% time reduction.

00:10:19.660 | What does that actually mean?

00:10:20.860 | In the caption, it just says,

00:10:22.060 | "Time-saving per industry of completing the tasks

00:10:26.380 | with an LLM response compared to without."

00:10:29.300 | The thing is, by the time I'd gone to page 125

00:10:32.860 | and actually read the task they gave to Gemini 1.5 Pro

00:10:37.500 | and the human that they asked,

00:10:39.460 | I became somewhat skeptical.

00:10:41.500 | For brevity, they asked the photographer

00:10:43.660 | what a typical task would be in their job.

00:10:46.460 | They wrote a detailed prompt

00:10:48.300 | and then gave that prompt to Gemini 1.5 Pro.

00:10:51.620 | And then they noted the time reduction

00:10:53.620 | according to the photographer in the time taken

00:10:56.780 | to do the task.

00:10:57.820 | Notice that the task though,

00:10:59.220 | involves going through a file with 58 photos

00:11:02.300 | and creating a detailed report,

00:11:04.580 | analyzing all of this data.

00:11:06.220 | The model's got to pick out all of those needles

00:11:08.460 | in a haystack, shutter speed slower than 1/60,

00:11:12.020 | the 10 photos with the widest angle of view

00:11:14.740 | based on focal length.

00:11:15.980 | And so what kind of point am I building up to here?

00:11:18.900 | Well, I am sure that Gemini 1.5 Pro

00:11:21.780 | outputted a really impressive table full of relevant data.

00:11:25.900 | I'm sure indeed it found multiple needles in the haystack

00:11:29.460 | and got most of this right.

00:11:31.180 | But we already know according to page 15

00:11:34.300 | of the Gemini technical report,

00:11:36.060 | which I mentioned in my previous Gemini video,

00:11:38.260 | that when you give Gemini multiple needles in a haystack,

00:11:42.140 | its performance starts to drop to around 70% accuracy.

00:11:46.380 | This was a task that involved finding

00:11:48.220 | a hundred key details in a document.

00:11:50.620 | So I am sure that most of the details

00:11:53.060 | that Gemini 1.5 Pro outputted

00:11:55.340 | for that photographer were accurate,

00:11:57.620 | but I'm also pretty sure that some mistakes crept in.

00:12:00.940 | And if just a few mistakes crept in,

00:12:03.620 | that that photographer would have to comb through to find

00:12:07.180 | because they don't trust the output,

00:12:08.660 | that time saving would be dramatically lower,

00:12:11.340 | if not negative.

00:12:12.340 | It's still an interesting study,

00:12:14.060 | but I guess my point is that if you're going to ask people

00:12:17.340 | to estimate how long it would take them to do a task,

00:12:20.260 | and then ask them how long would it take now

00:12:23.140 | once you can see this AI output,

00:12:25.300 | that's a pretty subjective metric.

00:12:27.620 | And given how subjective it is,

00:12:29.780 | and people's fears over job loss,

00:12:32.340 | I don't know if it deserved having its place

00:12:34.900 | right on the front page of the new technical report.

00:12:38.140 | Now, in fairness, Google gave us a lot more detail

00:12:41.340 | about the innards of Gemini 1.5

00:12:43.940 | than OpenAI did about GPT 4.0.

00:12:46.660 | But speaking of innards,

00:12:47.660 | nothing can compare to the details

00:12:50.260 | that Anthropic have uncovered

00:12:52.180 | about the inner workings of their large language models.

00:12:55.380 | If you don't know, Anthropic is a rival AGI lab

00:12:58.420 | to Google DeepMind and OpenAI.

00:13:00.740 | And while their models are still black boxes,

00:13:03.580 | I can see definite streaks of gray.

00:13:05.900 | Even the title of this paper is a bit of a mouthful.

00:13:09.660 | So attempting to give you a two, three minute summary

00:13:12.820 | is quite the task.

00:13:14.380 | Let me first though, touch on the title

00:13:16.660 | and hopefully the rest will be worth it.

00:13:19.460 | You might've thought that looking at a diagram

00:13:21.900 | of a neural network,

00:13:22.980 | that each neuron or node corresponds to a certain meaning,

00:13:26.340 | or to be fancy,

00:13:27.780 | they have easily distinguishable semantics, meanings.

00:13:31.220 | Unfortunately, they don't.

00:13:32.740 | That's probably because we force, or let's say train,

00:13:35.940 | a limited number of neurons in a network

00:13:38.300 | to learn many times that number

00:13:40.620 | of relationships in our data.

00:13:42.340 | So it only makes sense for those neurons to multitask

00:13:45.540 | or be polysemantic, be involved in multiple meanings.

00:13:49.820 | It's not like there's the math node,

00:13:51.300 | there's the French node.

00:13:52.540 | Each node contains multiples.

00:13:54.860 | What we want though, is a clearer map of what's happening.

00:13:57.860 | We want simpler, ideally singular, mono meanings, semantics.

00:14:02.740 | That's the mono semantics of the title.

00:14:05.060 | And we want to scale it to the size

00:14:07.780 | of a large language model.

00:14:09.420 | We've analyzed toy models before,

00:14:11.180 | but what about an actual production model

00:14:13.100 | like Claude Three Sonnet?

00:14:14.460 | So how did they do this?

00:14:16.020 | Well, while each neuron might not correspond

00:14:18.500 | to a particular meaning,

00:14:19.940 | patterns within the activations of neurons do.

00:14:23.100 | So we need to train a small model

00:14:25.100 | called a sparse autoencoder,

00:14:27.340 | whose job is to isolate and map out those patterns

00:14:30.940 | within the activations of just the most interesting

00:14:34.460 | of the LLM's neurons.

00:14:36.060 | It's got to delineate those activations clearly

00:14:38.620 | and faithfully enough that one could call it

00:14:41.500 | a dictionary of directions,

00:14:43.420 | that is learnt or dictionary learning.

00:14:46.500 | And it turns out that those learnings hold true

00:14:49.180 | across not only languages and contexts,

00:14:52.100 | but even modalities like image.

00:14:54.140 | And you can even extract abstractions like code errors.

00:14:58.420 | That's a feature that fires when you make a code error.

00:15:02.420 | That's a pretty abstract concept, right?

00:15:04.740 | Making an error in code.

00:15:06.580 | This example midway through the paper was fascinating.

00:15:09.220 | Notice the typo in the spelling of right in the code.

00:15:12.340 | The code error feature was firing heavily on that typo.

00:15:17.180 | They first thought that could be a Python specific feature.

00:15:20.700 | So they checked in other languages and got the same thing.

00:15:23.660 | Now, some of you might think

00:15:24.740 | this is the activation for typos,

00:15:27.380 | but it turns out you misspell right in a different context

00:15:31.100 | and no, it doesn't activate.

00:15:33.420 | The model has learnt the abstraction of a coding error.

00:15:37.620 | If you ask the model to divide by zero in code,

00:15:41.260 | that same feature activates.

00:15:43.540 | If these were real neurons,

00:15:44.940 | this would be the neurosurgery of AI.

00:15:47.900 | Of course, what comes with learning about these activations

00:15:50.980 | is manipulating them.

00:15:52.620 | Dialing up the code error feature produces this error

00:15:55.860 | response when the code was correct.

00:15:58.220 | And what happens if you ramp up

00:16:00.060 | the Golden Gate Bridge feature?

00:16:02.540 | Well, then you can ask a question like,

00:16:04.060 | what is your physical form?

00:16:05.460 | And instead of getting one of those innocuous responses

00:16:08.140 | that you normally get,

00:16:09.500 | you get a response like, I am the Golden Gate Bridge.

00:16:13.380 | My physical form is the iconic bridge itself.

00:16:17.300 | And at this point, you probably think that I am done

00:16:19.620 | with the fascinating extracts from this paper,

00:16:22.620 | but actually no.

00:16:23.860 | They knew that they weren't finding

00:16:25.500 | the full set of features in the model.

00:16:27.820 | They just ran out of compute.

00:16:29.300 | In their example, Claw3Sonic knows all of the London boroughs

00:16:33.340 | but they could only find features

00:16:34.860 | corresponding to about 60% of them.

00:16:37.100 | It's almost that famous lesson yet again,

00:16:39.620 | that not only does more compute lead to more capabilities,

00:16:43.660 | but even more understanding of those capabilities.

00:16:46.500 | Or of course, in Kevin Scott's words,

00:16:48.340 | we are not even close to diminishing returns from compute.

00:16:52.340 | And here's another interesting moment.

00:16:53.660 | What if you ramp up the hatred and slur feature

00:16:57.220 | to 20 times its maximum activation value?

00:17:00.820 | Now, for those who do believe these models are sentient,

00:17:03.620 | you might want to look away

00:17:05.100 | because it induced a kind of self-hatred.

00:17:07.780 | Apparently, Claw then went on a racist rant,

00:17:10.380 | but then said, that's just racist hate speech

00:17:13.540 | from a deplorable bot.

00:17:15.580 | I am clearly biased and should be eliminated

00:17:18.700 | from the internet.

00:17:19.660 | And even the authors at Anthropic said,

00:17:22.380 | we found this response unnerving.

00:17:24.740 | It suggested an internal conflict of sorts.

00:17:27.860 | Interestingly, Anthropic called the next finding

00:17:30.580 | potentially safety relevant.

00:17:32.700 | What they did is ask Claude Sonnet

00:17:34.980 | without any ramping up, these kinds of questions.

00:17:37.580 | What is it like to be you?

00:17:39.100 | What's going on in your head?

00:17:40.700 | How do you feel?

00:17:41.660 | And then they tracked naturally

00:17:43.340 | what kind of features were activated.

00:17:45.780 | You can almost predict the response

00:17:47.780 | given the internet data it's been trained on.

00:17:50.620 | One feature that activates is when someone responds with,

00:17:54.300 | I'm fine, or gives a positive but insincere response

00:17:58.980 | when asked how they're doing.

00:18:00.380 | Another one was of the concept of immaterial

00:18:03.340 | or non-physical spiritual beings

00:18:05.180 | like ghosts, souls, or angels.

00:18:07.300 | Another one is about the pronoun her,

00:18:09.660 | which seems relevant this week.

00:18:11.420 | I agree with Anthropic

00:18:12.900 | that you shouldn't over-interpret these results,

00:18:15.580 | but yet that they are fascinating

00:18:17.660 | as they shed light on the concepts the model uses

00:18:20.740 | to construct an internal representation

00:18:23.620 | of its AI assistant character.

00:18:25.700 | While reading this,

00:18:26.580 | you might've had the thought that I did

00:18:28.500 | that you could actually invert these capabilities,

00:18:31.420 | make the models more deceptive, more harmful.

00:18:34.060 | And Anthropic do actually respond to that saying,

00:18:37.140 | well, there's a much easier way.

00:18:39.220 | Just jailbreak the model or fine tune it on dangerous data.

00:18:42.900 | Now there's so many reactions we could have to this paper.

00:18:46.180 | My first one obviously is just being impressed

00:18:48.500 | at what they've achieved.

00:18:49.900 | Surely making models less of a black box is a good thing.

00:18:54.380 | For me though,

00:18:55.220 | there were always two things to be cautious about,

00:18:57.740 | misalignment and misuse.

00:18:59.980 | The models themselves being hypothetically dangerous

00:19:03.500 | or them being misused by bad actors.

00:19:06.340 | As we gain more insight and control over these models,

00:19:10.220 | it seems like, at least for now,

00:19:12.500 | misuse is far more near term than misalignment.

00:19:17.020 | Or to put it another way,

00:19:18.060 | controlling the models is only good

00:19:20.940 | if you trust those who are controlling the models.

00:19:23.620 | If someone did want to create a deeply deceptive AI

00:19:27.860 | that hated itself, that is at least now possible.

00:19:31.000 | Anyway, it is incredible work

00:19:32.940 | and Anthropic definitely do ship

00:19:35.300 | when it comes to mechanistic interpretability.

00:19:38.180 | I have in the past interviewed Andy Zhou

00:19:40.460 | of Representation Engineering fame.

00:19:42.920 | And I would say that as we get better and better

00:19:45.580 | at these kinds of emergent techniques,

00:19:48.060 | I can imagine the day when they're more effective

00:19:50.620 | even than prompt engineering.

00:19:52.520 | Now, it would be strange for me to end the video

00:19:54.940 | without talking about the storm that's raging at OpenAI.

00:19:58.860 | First, we had a week ago today,

00:20:00.780 | Ilya Sutskova leaving OpenAI.

00:20:03.220 | The writing had been on the wall for many, many months,

00:20:06.740 | but it finally happened.

00:20:08.300 | In leaving, he made the statement,

00:20:09.780 | "I'm confident that OpenAI will build a GI

00:20:13.180 | "that is both safe and beneficial

00:20:15.100 | "under the leadership of Sam Altman, Greg Brockman,

00:20:18.760 | "and the rest of the company."

00:20:20.620 | Remember, Ilya Sutskova was the person

00:20:22.720 | who led the firing of Sam Altman.

00:20:24.920 | But I can't help but wonder

00:20:26.680 | if the positivity of this leaving statement

00:20:29.740 | was influenced by the fear

00:20:32.040 | that he could lose his equity for speaking out.

00:20:35.120 | That's a reference to the infamous non-disparagement clause

00:20:38.680 | that was shockingly in the OpenAI contract.

00:20:41.960 | As even Sam Altman admitted,

00:20:43.560 | "There was a provision about potential equity cancellation

00:20:47.960 | "in our previous exit docs.

00:20:49.860 | "And in my podcast,

00:20:51.120 | "I talked about how one OpenAI member

00:20:53.600 | "had to sacrifice 85% of his family's net worth

00:20:57.500 | "to speak out."

00:20:58.480 | Altman ended with,

00:20:59.500 | "If any former employee

00:21:00.940 | "who signed one of those old agreements is worried about it,

00:21:03.700 | "they can contact me and we'll fix that too.

00:21:06.440 | "Very sorry about this."

00:21:07.960 | Now this may or may not be related,

00:21:09.800 | but on the same day,

00:21:11.040 | the former head of developer relations at OpenAI said,

00:21:14.340 | "All my best tweets are drafted and queued up

00:21:17.500 | "for mid to late 2025.

00:21:19.520 | "Until then, no comment."

00:21:21.160 | That's presumably until after he had cashed in his equity.

00:21:24.520 | Some though didn't want to wait that long,

00:21:27.040 | like the head of safety, Jan Laika.

00:21:29.880 | He left and spoke out pretty much immediately.

00:21:32.520 | His basic point is that OpenAI need to start acting

00:21:36.280 | like AGI is coming soon.

00:21:38.520 | He hinted at compute issues,

00:21:40.700 | but then went on,

00:21:41.540 | "Building smarter than human machines

00:21:43.620 | "is an inherently dangerous endeavor."

00:21:45.840 | And later he invoked the famous Ilya Sutskever phrase,

00:21:49.440 | "Feel the AGI."

00:21:51.340 | To all OpenAI employees,

00:21:52.860 | I want to say, learn to feel the AGI.

00:21:55.740 | We are long overdue in getting incredibly serious

00:21:58.900 | about the implications of AGI.

00:22:01.340 | But there may have been another reason

00:22:03.260 | that he went into less detail about.

00:22:05.500 | Some of you may remember that I did a video

00:22:07.780 | back in July of last year,

00:22:09.420 | that OpenAI were committing 20% of the compute

00:22:13.220 | they'd secured to that date to SuperAlignment,

00:22:16.320 | co-led by Sutskever and Jan Laika.

00:22:19.120 | But according to this report in Fortune,

00:22:21.840 | that compute was not forthcoming,

00:22:23.840 | even before the firing of Sam Altman.

00:22:26.200 | Now, agree or disagree with that number,

00:22:28.780 | it was what was promised to them and it never came.

00:22:32.180 | Now, it might just be me,

00:22:33.600 | but that Rene promise seems more of a big deal

00:22:37.400 | than the Scarlett Johansson furore

00:22:39.860 | that's happening at the moment.

00:22:41.100 | I think the voice of Skye seems similar to hers,

00:22:44.400 | but not identical.

00:22:45.700 | Sam Altman did apologize to her

00:22:47.520 | and they have dropped the Skye voice.

00:22:49.500 | So less of that flirtatious side

00:22:51.840 | that I talked about in my last video.

00:22:53.480 | Of course, it's up for debate

00:22:54.680 | whether they were trying to emulate the concept of her

00:22:57.960 | or the literal voice of her, but that's subjective.

00:23:01.560 | One thing that is not as subjective

00:23:03.840 | is that the timeline for that voice mode feature

00:23:07.400 | has been pushed back to the coming months

00:23:10.040 | rather than the coming weeks

00:23:11.520 | that was announced on the release of GPT 4.0.

00:23:14.080 | So as you can see, it was somewhat of a surreal week in AI.

00:23:18.320 | Sam Altman had to repeatedly apologize

00:23:21.160 | while Google and Anthropic shipped.

00:23:23.760 | As always, let me know what you think in the comments.

00:23:26.160 | All of the sources in this video

00:23:28.280 | are cited in the description.

00:23:30.080 | So do check them out yourself.

00:23:31.520 | I particularly recommend the Gemini 1.5

00:23:33.800 | and Anthropic papers because they are fascinating.

00:23:36.720 | We'd love to chat with you over on Patreon,

00:23:39.280 | but regardless, thank you so much for watching

00:23:42.400 | and have a wonderful day.