back to index

Microsoft Promises a 'Whale' for GPT-5, Anthropic Delves Inside a Model’s Mind and Altman Stumbles


Whisper Transcript | Transcript Only Page

00:00:00.000 | While Microsoft spend billions on shipping a whale-sized GPT-5,
00:00:05.920 | OpenAI gets tossed about in a storm of its own creation.
00:00:10.860 | Meanwhile, Google revealed powerful new details
00:00:13.920 | about the Gemini models that many will have missed.
00:00:17.560 | And then it was just yesterday that Anthropic showed us
00:00:20.720 | how they are the closest to understanding
00:00:24.440 | what goes on at the very core of a large language model.
00:00:28.720 | But I want to start with Kevin Scott, the CTO of Microsoft,
00:00:33.120 | who said something which, if true,
00:00:35.760 | is the biggest news of the week and even the month.
00:00:39.640 | According to him, we are not even close
00:00:42.000 | to diminishing returns with the power of AI models.
00:00:45.760 | Since about 2012, that rate of increase in compute
00:00:50.440 | when applied to training has been increasing exponentially.
00:00:54.280 | And we are nowhere near the point
00:00:56.560 | of diminishing marginal returns on how powerful
00:00:59.440 | we can make AI models as we increase the scale of compute.
00:01:03.240 | As we'll see, Kevin Scott knows both the size
00:01:05.920 | and power of GPT-5, if that's what they call it.
00:01:09.440 | So these words have more weight than you might think.
00:01:12.280 | And while we're speaking of exponentials,
00:01:14.120 | AI models are undeniably becoming faster and cheaper.
00:01:18.720 | While we're off building bigger supercomputers
00:01:21.640 | to get the next big models out
00:01:23.680 | and to deliver more and more capability to you,
00:01:25.920 | like we're also grinding away on making
00:01:29.120 | the current generation of models much, much more efficient.
00:01:32.840 | So between the launch of GPT-4,
00:01:35.520 | which is not quite a year and a half ago now,
00:01:38.600 | it's 12 times cheaper to make a call to GPT-4.0
00:01:42.800 | than the original GPT-4 model.
00:01:46.160 | And it's also six times faster
00:01:47.960 | in terms of like time to first token response.
00:01:51.040 | - On this channel, admittedly, I am laser focused
00:01:53.800 | on the growing intelligence of models,
00:01:56.560 | but this massive drop in cost
00:01:59.400 | does have some pretty profound ramifications too.
00:02:02.200 | It is kind of an obvious point,
00:02:03.800 | but when we get the first generally intelligent AI model,
00:02:07.360 | we will soon get ubiquitous AI models.
00:02:10.720 | Unless it gets monopolized, artificial intelligence,
00:02:13.640 | if it carries on getting cheaper and cheaper,
00:02:16.080 | could become absolutely pervasive
00:02:18.640 | inside your toaster and security camera,
00:02:21.280 | not just your laptop.
00:02:22.440 | Anyway, I promised you a whale analogy and here it is.
00:02:26.120 | - There's this like really beautiful relationship right now
00:02:29.320 | between the sort of exponential progression of compute
00:02:31.800 | that we're applying to building the platform,
00:02:34.440 | to the capability and power of the platform that we get.
00:02:37.720 | And I just wanted to, you know, sort of without,
00:02:40.120 | without mentioning numbers, which is sort of hard to do,
00:02:44.880 | to give you all an idea of the scaling of these systems.
00:02:49.480 | So in 2020, we built our first AI supercomputer for open AI.
00:02:54.480 | It's the supercomputing environment that trained GBD3.
00:02:59.080 | And so like, we're gonna just choose marine wildlife
00:03:03.240 | as our scale marker.
00:03:04.880 | So you can think of that system about as big as a shark.
00:03:09.840 | So the next system that we built,
00:03:13.400 | scale-wise is about as big as an orca.
00:03:16.960 | And like, that is the system that we delivered in 2022
00:03:20.880 | that trains GPT4.
00:03:23.000 | The system that we have just deployed is like scale-wise,
00:03:28.640 | about as big as a whale relative to like, you know,
00:03:32.120 | the shark-sized supercomputer
00:03:33.800 | and this orca-sized supercomputer.
00:03:35.680 | And it turns out like you can build a whole hell of a lot
00:03:37.880 | of AI with a whale-sized supercomputer.
00:03:40.200 | Just want everybody to really, really be thinking clearly
00:03:43.360 | about, and like, this is gonna be our segue
00:03:46.280 | to talking with Sam, is the next sample is coming.
00:03:49.800 | So like, this whale-sized supercomputer
00:03:52.120 | is hard at work right now,
00:03:53.920 | building the next set of capabilities
00:03:55.880 | that we're going to put into your hands
00:03:58.280 | so that you all can do the next round
00:04:00.760 | of amazing things with it.
00:04:02.360 | - As for the actual release date
00:04:03.800 | of this mysterious whale-sized model,
00:04:06.400 | Sam Altman would give no hint and Kevin Scott
00:04:09.480 | just described it as being within K months.
00:04:12.760 | On a quick side note, when one commenter said
00:04:15.200 | that GPT-4.0, as good as it is,
00:04:17.400 | shows that OpenAI simply don't know
00:04:19.760 | how to produce further capability advances.
00:04:22.120 | They can't do exponential improvements
00:04:24.360 | and they don't have GPT-5 even after 14 months of trying.
00:04:28.400 | The response from the head of Frontiers Research at OpenAI
00:04:32.560 | was, "Remind me in six months."
00:04:35.360 | I'm gonna leave OpenAI for a moment
00:04:37.360 | because I want to focus this video on Google and Anthropic
00:04:41.400 | who have both shipped very interesting developments.
00:04:45.240 | And first I want to focus on Google
00:04:47.160 | because I really feel like they buried the lead
00:04:49.920 | at the recent Google I/O event.
00:04:52.160 | They made, by my count, 123 mentions of AI,
00:04:55.920 | but didn't detail the improvements
00:04:58.000 | to their impressive Gemini 1.5 Pro.
00:05:01.000 | And they barely mentioned Gemini 1.5 Flash,
00:05:04.080 | which was trained in part
00:05:05.960 | by imitating the output of Gemini 1.5 Pro.
00:05:09.640 | The weird thing for me is that I had already read
00:05:11.720 | the 100 plus page Gemini report and done a video on it,
00:05:15.480 | but this refreshed report was so interesting,
00:05:19.120 | I counted a dozen new insights.
00:05:21.640 | I'm only gonna talk about around five today,
00:05:23.760 | otherwise this video would be way too long,
00:05:26.320 | but I will be coming back to this paper.
00:05:28.560 | The first thing to know is that you can already play about
00:05:31.360 | with these models in the Google AI Studio.
00:05:34.200 | Both Gemini 1.5 Pro accept video input, image input,
00:05:38.560 | text input, up to, for now, 1 million tokens.
00:05:42.120 | That's way more than GPT 4.0.
00:05:44.520 | Admittedly, Gemini 1.5 Pro does not have the RIS of GPT 4.0,
00:05:49.520 | but there are prizes for making impactful apps with it.
00:05:53.560 | Back to the highlight of the paper though,
00:05:56.080 | and page 43 I found really interesting.
00:06:00.360 | If you've been following the channel for a while,
00:06:02.600 | you'd know that adaptive compute,
00:06:05.320 | or essentially letting the models think for longer,
00:06:08.080 | is a very promising direction
00:06:10.640 | in advancing the intelligence of models.
00:06:12.960 | Well, this update to a paper was the first time
00:06:16.200 | I saw it in action with a current
00:06:19.000 | state-of-the-art large language model.
00:06:20.920 | Google wanted to understand how far they could push
00:06:25.280 | the quantitative reasoning capabilities
00:06:27.600 | of large language models,
00:06:28.960 | and they describe how mathematicians often benefit
00:06:31.900 | from extended periods of thought or contemplation
00:06:35.720 | while formulating solutions.
00:06:37.580 | And critically, they aim to emulate this
00:06:41.920 | by training a math-specialized model
00:06:44.680 | and providing it additional inference time computation,
00:06:48.160 | allowing it, they say,
00:06:49.000 | to explore a wider range of possibilities.
00:06:51.600 | If you want more background, do check out my Q* video,
00:06:54.320 | but if this general approach works,
00:06:56.440 | it means you could potentially squeeze out
00:06:58.920 | orders of magnitude more intelligence
00:07:01.200 | from the same size of model.
00:07:03.040 | Remember too that any improvements during inference,
00:07:06.000 | when the model is actually outputting tokens,
00:07:08.420 | would be complimentary to,
00:07:10.140 | in addition to improvements derived from scale,
00:07:13.620 | aka growing the models into giant whales.
00:07:16.660 | So what were the results?
00:07:18.460 | Well, we got a new record score
00:07:20.820 | on the math benchmark of 91.1%.
00:07:24.540 | So impressive was that to many
00:07:26.660 | that the CEO of Google, Sundar Pichai, tweeted it out.
00:07:30.300 | With that particular result though,
00:07:32.620 | there is a slight asterisk
00:07:34.740 | because the benchmark itself,
00:07:36.900 | surprise, surprise, has some issues.
00:07:39.260 | If you want to know more about those issues
00:07:41.460 | and my first glimpse of optimism for benchmarks,
00:07:45.180 | do check out the AI Insiders tier on Patreon.
00:07:48.460 | Making that video was almost cathartic to me
00:07:50.860 | because by the end, for the first time,
00:07:52.980 | I actually had hope
00:07:54.180 | that we could benchmark models properly.
00:07:56.700 | And while you're on Insiders,
00:07:58.340 | if you use AI agents at all in enterprise
00:08:01.980 | or are thinking of doing so,
00:08:03.740 | do check out our AI Insider resident expert,
00:08:06.900 | Donato Capitella, on prompt injections
00:08:09.900 | in the AI agent era.
00:08:11.820 | The effect of that extra thinking time though,
00:08:14.420 | was pretty dramatic for other benchmarks too,
00:08:17.500 | especially if you compare the performance
00:08:19.820 | of this math specialized 1.5 Pro to, say, CLAW 3 Opus.
00:08:24.580 | Of course, I wish the paper gave more details,
00:08:27.260 | but they do say the increased performance
00:08:29.580 | was achieved without code execution,
00:08:31.700 | clear improving libraries, Google search or other tools.
00:08:34.700 | Moreover, the performance is on par
00:08:36.620 | with a human expert performance.
00:08:39.060 | Very quickly, before I move on from benchmarks,
00:08:41.740 | it would be somewhat remiss of me
00:08:43.780 | if I didn't point out the new record in the MMLU.
00:08:47.380 | Now, yes, it used extra sampling
00:08:49.420 | and the benchmark is somewhat broken,
00:08:51.860 | but in previous months,
00:08:53.420 | a score of 91.7% would have made headlines.
00:08:57.420 | It must be said that for most of the other benchmarks though,
00:09:00.500 | GPC 4.0 beats out Gemini 1.5 Pro.
00:09:04.340 | Now, I know this table is a little bit confusing,
00:09:07.140 | but it means that the middle-sized model of today, 1.5 Pro,
00:09:11.180 | we don't have 1.5 Ultra,
00:09:12.860 | but the middle-sized model, 1.5 Pro,
00:09:15.260 | the new version, the May version,
00:09:17.100 | beats the original large version, 1.0 Ultra, handily.
00:09:21.140 | Not for audio, randomly,
00:09:22.900 | but for core capabilities, it's not even close.
00:09:25.900 | And the comparison gets even more dramatic
00:09:28.460 | when you look at the performance of Gemini 1.5 Flash,
00:09:31.980 | which is their super quick, super cheap model
00:09:34.620 | compared to the original GPT-4 size compute, 1.0 Ultra.
00:09:39.180 | Let's not ignore, by the way,
00:09:40.140 | that they can handle up to 10 million tokens.
00:09:42.460 | That's just a side note.
00:09:43.580 | Gemini Flash, by the way,
00:09:44.460 | is something like 35 cents for a million tokens.
00:09:47.420 | And I think by price alone, that will unlock new use cases.
00:09:51.380 | And speaking of use cases,
00:09:53.420 | the paper did something quite interesting
00:09:56.540 | and almost controversial that I haven't seen before.
00:09:59.300 | Within the model technical report itself,
00:10:02.060 | they laid out the kind of impact they expect
00:10:05.100 | across a range of industries.
00:10:07.460 | Now, while the whole numbers go up phenomenon
00:10:10.020 | is certainly impressive,
00:10:11.700 | when you dig into the details,
00:10:13.420 | it gets a little bit more murky.
00:10:15.380 | Take photography when they describe a 73% time reduction.
00:10:19.660 | What does that actually mean?
00:10:20.860 | In the caption, it just says,
00:10:22.060 | "Time-saving per industry of completing the tasks
00:10:26.380 | with an LLM response compared to without."
00:10:29.300 | The thing is, by the time I'd gone to page 125
00:10:32.860 | and actually read the task they gave to Gemini 1.5 Pro
00:10:37.500 | and the human that they asked,
00:10:39.460 | I became somewhat skeptical.
00:10:41.500 | For brevity, they asked the photographer
00:10:43.660 | what a typical task would be in their job.
00:10:46.460 | They wrote a detailed prompt
00:10:48.300 | and then gave that prompt to Gemini 1.5 Pro.
00:10:51.620 | And then they noted the time reduction
00:10:53.620 | according to the photographer in the time taken
00:10:56.780 | to do the task.
00:10:57.820 | Notice that the task though,
00:10:59.220 | involves going through a file with 58 photos
00:11:02.300 | and creating a detailed report,
00:11:04.580 | analyzing all of this data.
00:11:06.220 | The model's got to pick out all of those needles
00:11:08.460 | in a haystack, shutter speed slower than 1/60,
00:11:12.020 | the 10 photos with the widest angle of view
00:11:14.740 | based on focal length.
00:11:15.980 | And so what kind of point am I building up to here?
00:11:18.900 | Well, I am sure that Gemini 1.5 Pro
00:11:21.780 | outputted a really impressive table full of relevant data.
00:11:25.900 | I'm sure indeed it found multiple needles in the haystack
00:11:29.460 | and got most of this right.
00:11:31.180 | But we already know according to page 15
00:11:34.300 | of the Gemini technical report,
00:11:36.060 | which I mentioned in my previous Gemini video,
00:11:38.260 | that when you give Gemini multiple needles in a haystack,
00:11:42.140 | its performance starts to drop to around 70% accuracy.
00:11:46.380 | This was a task that involved finding
00:11:48.220 | a hundred key details in a document.
00:11:50.620 | So I am sure that most of the details
00:11:53.060 | that Gemini 1.5 Pro outputted
00:11:55.340 | for that photographer were accurate,
00:11:57.620 | but I'm also pretty sure that some mistakes crept in.
00:12:00.940 | And if just a few mistakes crept in,
00:12:03.620 | that that photographer would have to comb through to find
00:12:07.180 | because they don't trust the output,
00:12:08.660 | that time saving would be dramatically lower,
00:12:11.340 | if not negative.
00:12:12.340 | It's still an interesting study,
00:12:14.060 | but I guess my point is that if you're going to ask people
00:12:17.340 | to estimate how long it would take them to do a task,
00:12:20.260 | and then ask them how long would it take now
00:12:23.140 | once you can see this AI output,
00:12:25.300 | that's a pretty subjective metric.
00:12:27.620 | And given how subjective it is,
00:12:29.780 | and people's fears over job loss,
00:12:32.340 | I don't know if it deserved having its place
00:12:34.900 | right on the front page of the new technical report.
00:12:38.140 | Now, in fairness, Google gave us a lot more detail
00:12:41.340 | about the innards of Gemini 1.5
00:12:43.940 | than OpenAI did about GPT 4.0.
00:12:46.660 | But speaking of innards,
00:12:47.660 | nothing can compare to the details
00:12:50.260 | that Anthropic have uncovered
00:12:52.180 | about the inner workings of their large language models.
00:12:55.380 | If you don't know, Anthropic is a rival AGI lab
00:12:58.420 | to Google DeepMind and OpenAI.
00:13:00.740 | And while their models are still black boxes,
00:13:03.580 | I can see definite streaks of gray.
00:13:05.900 | Even the title of this paper is a bit of a mouthful.
00:13:09.660 | So attempting to give you a two, three minute summary
00:13:12.820 | is quite the task.
00:13:14.380 | Let me first though, touch on the title
00:13:16.660 | and hopefully the rest will be worth it.
00:13:19.460 | You might've thought that looking at a diagram
00:13:21.900 | of a neural network,
00:13:22.980 | that each neuron or node corresponds to a certain meaning,
00:13:26.340 | or to be fancy,
00:13:27.780 | they have easily distinguishable semantics, meanings.
00:13:31.220 | Unfortunately, they don't.
00:13:32.740 | That's probably because we force, or let's say train,
00:13:35.940 | a limited number of neurons in a network
00:13:38.300 | to learn many times that number
00:13:40.620 | of relationships in our data.
00:13:42.340 | So it only makes sense for those neurons to multitask
00:13:45.540 | or be polysemantic, be involved in multiple meanings.
00:13:49.820 | It's not like there's the math node,
00:13:51.300 | there's the French node.
00:13:52.540 | Each node contains multiples.
00:13:54.860 | What we want though, is a clearer map of what's happening.
00:13:57.860 | We want simpler, ideally singular, mono meanings, semantics.
00:14:02.740 | That's the mono semantics of the title.
00:14:05.060 | And we want to scale it to the size
00:14:07.780 | of a large language model.
00:14:09.420 | We've analyzed toy models before,
00:14:11.180 | but what about an actual production model
00:14:13.100 | like Claude Three Sonnet?
00:14:14.460 | So how did they do this?
00:14:16.020 | Well, while each neuron might not correspond
00:14:18.500 | to a particular meaning,
00:14:19.940 | patterns within the activations of neurons do.
00:14:23.100 | So we need to train a small model
00:14:25.100 | called a sparse autoencoder,
00:14:27.340 | whose job is to isolate and map out those patterns
00:14:30.940 | within the activations of just the most interesting
00:14:34.460 | of the LLM's neurons.
00:14:36.060 | It's got to delineate those activations clearly
00:14:38.620 | and faithfully enough that one could call it
00:14:41.500 | a dictionary of directions,
00:14:43.420 | that is learnt or dictionary learning.
00:14:46.500 | And it turns out that those learnings hold true
00:14:49.180 | across not only languages and contexts,
00:14:52.100 | but even modalities like image.
00:14:54.140 | And you can even extract abstractions like code errors.
00:14:58.420 | That's a feature that fires when you make a code error.
00:15:02.420 | That's a pretty abstract concept, right?
00:15:04.740 | Making an error in code.
00:15:06.580 | This example midway through the paper was fascinating.
00:15:09.220 | Notice the typo in the spelling of right in the code.
00:15:12.340 | The code error feature was firing heavily on that typo.
00:15:17.180 | They first thought that could be a Python specific feature.
00:15:20.700 | So they checked in other languages and got the same thing.
00:15:23.660 | Now, some of you might think
00:15:24.740 | this is the activation for typos,
00:15:27.380 | but it turns out you misspell right in a different context
00:15:31.100 | and no, it doesn't activate.
00:15:33.420 | The model has learnt the abstraction of a coding error.
00:15:37.620 | If you ask the model to divide by zero in code,
00:15:41.260 | that same feature activates.
00:15:43.540 | If these were real neurons,
00:15:44.940 | this would be the neurosurgery of AI.
00:15:47.900 | Of course, what comes with learning about these activations
00:15:50.980 | is manipulating them.
00:15:52.620 | Dialing up the code error feature produces this error
00:15:55.860 | response when the code was correct.
00:15:58.220 | And what happens if you ramp up
00:16:00.060 | the Golden Gate Bridge feature?
00:16:02.540 | Well, then you can ask a question like,
00:16:04.060 | what is your physical form?
00:16:05.460 | And instead of getting one of those innocuous responses
00:16:08.140 | that you normally get,
00:16:09.500 | you get a response like, I am the Golden Gate Bridge.
00:16:13.380 | My physical form is the iconic bridge itself.
00:16:17.300 | And at this point, you probably think that I am done
00:16:19.620 | with the fascinating extracts from this paper,
00:16:22.620 | but actually no.
00:16:23.860 | They knew that they weren't finding
00:16:25.500 | the full set of features in the model.
00:16:27.820 | They just ran out of compute.
00:16:29.300 | In their example, Claw3Sonic knows all of the London boroughs
00:16:33.340 | but they could only find features
00:16:34.860 | corresponding to about 60% of them.
00:16:37.100 | It's almost that famous lesson yet again,
00:16:39.620 | that not only does more compute lead to more capabilities,
00:16:43.660 | but even more understanding of those capabilities.
00:16:46.500 | Or of course, in Kevin Scott's words,
00:16:48.340 | we are not even close to diminishing returns from compute.
00:16:52.340 | And here's another interesting moment.
00:16:53.660 | What if you ramp up the hatred and slur feature
00:16:57.220 | to 20 times its maximum activation value?
00:17:00.820 | Now, for those who do believe these models are sentient,
00:17:03.620 | you might want to look away
00:17:05.100 | because it induced a kind of self-hatred.
00:17:07.780 | Apparently, Claw then went on a racist rant,
00:17:10.380 | but then said, that's just racist hate speech
00:17:13.540 | from a deplorable bot.
00:17:15.580 | I am clearly biased and should be eliminated
00:17:18.700 | from the internet.
00:17:19.660 | And even the authors at Anthropic said,
00:17:22.380 | we found this response unnerving.
00:17:24.740 | It suggested an internal conflict of sorts.
00:17:27.860 | Interestingly, Anthropic called the next finding
00:17:30.580 | potentially safety relevant.
00:17:32.700 | What they did is ask Claude Sonnet
00:17:34.980 | without any ramping up, these kinds of questions.
00:17:37.580 | What is it like to be you?
00:17:39.100 | What's going on in your head?
00:17:40.700 | How do you feel?
00:17:41.660 | And then they tracked naturally
00:17:43.340 | what kind of features were activated.
00:17:45.780 | You can almost predict the response
00:17:47.780 | given the internet data it's been trained on.
00:17:50.620 | One feature that activates is when someone responds with,
00:17:54.300 | I'm fine, or gives a positive but insincere response
00:17:58.980 | when asked how they're doing.
00:18:00.380 | Another one was of the concept of immaterial
00:18:03.340 | or non-physical spiritual beings
00:18:05.180 | like ghosts, souls, or angels.
00:18:07.300 | Another one is about the pronoun her,
00:18:09.660 | which seems relevant this week.
00:18:11.420 | I agree with Anthropic
00:18:12.900 | that you shouldn't over-interpret these results,
00:18:15.580 | but yet that they are fascinating
00:18:17.660 | as they shed light on the concepts the model uses
00:18:20.740 | to construct an internal representation
00:18:23.620 | of its AI assistant character.
00:18:25.700 | While reading this,
00:18:26.580 | you might've had the thought that I did
00:18:28.500 | that you could actually invert these capabilities,
00:18:31.420 | make the models more deceptive, more harmful.
00:18:34.060 | And Anthropic do actually respond to that saying,
00:18:37.140 | well, there's a much easier way.
00:18:39.220 | Just jailbreak the model or fine tune it on dangerous data.
00:18:42.900 | Now there's so many reactions we could have to this paper.
00:18:46.180 | My first one obviously is just being impressed
00:18:48.500 | at what they've achieved.
00:18:49.900 | Surely making models less of a black box is a good thing.
00:18:54.380 | For me though,
00:18:55.220 | there were always two things to be cautious about,
00:18:57.740 | misalignment and misuse.
00:18:59.980 | The models themselves being hypothetically dangerous
00:19:03.500 | or them being misused by bad actors.
00:19:06.340 | As we gain more insight and control over these models,
00:19:10.220 | it seems like, at least for now,
00:19:12.500 | misuse is far more near term than misalignment.
00:19:17.020 | Or to put it another way,
00:19:18.060 | controlling the models is only good
00:19:20.940 | if you trust those who are controlling the models.
00:19:23.620 | If someone did want to create a deeply deceptive AI
00:19:27.860 | that hated itself, that is at least now possible.
00:19:31.000 | Anyway, it is incredible work
00:19:32.940 | and Anthropic definitely do ship
00:19:35.300 | when it comes to mechanistic interpretability.
00:19:38.180 | I have in the past interviewed Andy Zhou
00:19:40.460 | of Representation Engineering fame.
00:19:42.920 | And I would say that as we get better and better
00:19:45.580 | at these kinds of emergent techniques,
00:19:48.060 | I can imagine the day when they're more effective
00:19:50.620 | even than prompt engineering.
00:19:52.520 | Now, it would be strange for me to end the video
00:19:54.940 | without talking about the storm that's raging at OpenAI.
00:19:58.860 | First, we had a week ago today,
00:20:00.780 | Ilya Sutskova leaving OpenAI.
00:20:03.220 | The writing had been on the wall for many, many months,
00:20:06.740 | but it finally happened.
00:20:08.300 | In leaving, he made the statement,
00:20:09.780 | "I'm confident that OpenAI will build a GI
00:20:13.180 | "that is both safe and beneficial
00:20:15.100 | "under the leadership of Sam Altman, Greg Brockman,
00:20:18.760 | "and the rest of the company."
00:20:20.620 | Remember, Ilya Sutskova was the person
00:20:22.720 | who led the firing of Sam Altman.
00:20:24.920 | But I can't help but wonder
00:20:26.680 | if the positivity of this leaving statement
00:20:29.740 | was influenced by the fear
00:20:32.040 | that he could lose his equity for speaking out.
00:20:35.120 | That's a reference to the infamous non-disparagement clause
00:20:38.680 | that was shockingly in the OpenAI contract.
00:20:41.960 | As even Sam Altman admitted,
00:20:43.560 | "There was a provision about potential equity cancellation
00:20:47.960 | "in our previous exit docs.
00:20:49.860 | "And in my podcast,
00:20:51.120 | "I talked about how one OpenAI member
00:20:53.600 | "had to sacrifice 85% of his family's net worth
00:20:57.500 | "to speak out."
00:20:58.480 | Altman ended with,
00:20:59.500 | "If any former employee
00:21:00.940 | "who signed one of those old agreements is worried about it,
00:21:03.700 | "they can contact me and we'll fix that too.
00:21:06.440 | "Very sorry about this."
00:21:07.960 | Now this may or may not be related,
00:21:09.800 | but on the same day,
00:21:11.040 | the former head of developer relations at OpenAI said,
00:21:14.340 | "All my best tweets are drafted and queued up
00:21:17.500 | "for mid to late 2025.
00:21:19.520 | "Until then, no comment."
00:21:21.160 | That's presumably until after he had cashed in his equity.
00:21:24.520 | Some though didn't want to wait that long,
00:21:27.040 | like the head of safety, Jan Laika.
00:21:29.880 | He left and spoke out pretty much immediately.
00:21:32.520 | His basic point is that OpenAI need to start acting
00:21:36.280 | like AGI is coming soon.
00:21:38.520 | He hinted at compute issues,
00:21:40.700 | but then went on,
00:21:41.540 | "Building smarter than human machines
00:21:43.620 | "is an inherently dangerous endeavor."
00:21:45.840 | And later he invoked the famous Ilya Sutskever phrase,
00:21:49.440 | "Feel the AGI."
00:21:51.340 | To all OpenAI employees,
00:21:52.860 | I want to say, learn to feel the AGI.
00:21:55.740 | We are long overdue in getting incredibly serious
00:21:58.900 | about the implications of AGI.
00:22:01.340 | But there may have been another reason
00:22:03.260 | that he went into less detail about.
00:22:05.500 | Some of you may remember that I did a video
00:22:07.780 | back in July of last year,
00:22:09.420 | that OpenAI were committing 20% of the compute
00:22:13.220 | they'd secured to that date to SuperAlignment,
00:22:16.320 | co-led by Sutskever and Jan Laika.
00:22:19.120 | But according to this report in Fortune,
00:22:21.840 | that compute was not forthcoming,
00:22:23.840 | even before the firing of Sam Altman.
00:22:26.200 | Now, agree or disagree with that number,
00:22:28.780 | it was what was promised to them and it never came.
00:22:32.180 | Now, it might just be me,
00:22:33.600 | but that Rene promise seems more of a big deal
00:22:37.400 | than the Scarlett Johansson furore
00:22:39.860 | that's happening at the moment.
00:22:41.100 | I think the voice of Skye seems similar to hers,
00:22:44.400 | but not identical.
00:22:45.700 | Sam Altman did apologize to her
00:22:47.520 | and they have dropped the Skye voice.
00:22:49.500 | So less of that flirtatious side
00:22:51.840 | that I talked about in my last video.
00:22:53.480 | Of course, it's up for debate
00:22:54.680 | whether they were trying to emulate the concept of her
00:22:57.960 | or the literal voice of her, but that's subjective.
00:23:01.560 | One thing that is not as subjective
00:23:03.840 | is that the timeline for that voice mode feature
00:23:07.400 | has been pushed back to the coming months
00:23:10.040 | rather than the coming weeks
00:23:11.520 | that was announced on the release of GPT 4.0.
00:23:14.080 | So as you can see, it was somewhat of a surreal week in AI.
00:23:18.320 | Sam Altman had to repeatedly apologize
00:23:21.160 | while Google and Anthropic shipped.
00:23:23.760 | As always, let me know what you think in the comments.
00:23:26.160 | All of the sources in this video
00:23:28.280 | are cited in the description.
00:23:30.080 | So do check them out yourself.
00:23:31.520 | I particularly recommend the Gemini 1.5
00:23:33.800 | and Anthropic papers because they are fascinating.
00:23:36.720 | We'd love to chat with you over on Patreon,
00:23:39.280 | but regardless, thank you so much for watching
00:23:42.400 | and have a wonderful day.