back to index

The Wild World of AI: 6 Months That Changed Everything


Whisper Transcript | Transcript Only Page

00:00:00.320 | There are all of these benchmarks full of numbers.
00:00:02.080 | I don't like the numbers.
00:00:03.280 | There are the leaderboards.
00:00:04.760 | I'm kind of beginning to lose trust in the leaderboards as well.
00:00:07.840 | So for my own work, I've been leaning increasingly
00:00:10.240 | into my own little benchmark, which started as a joke
00:00:12.960 | and has actually turned into something
00:00:14.680 | that I've learned quite a lot.
00:00:15.840 | And that's this.
00:00:16.480 | I prompt models with generate an SVG
00:00:19.120 | of a pelican riding a bicycle.
00:00:22.000 | I have good reasons for this.
00:00:23.560 | Firstly, these are not image models.
00:00:25.240 | These are text models.
00:00:26.080 | They shouldn't be able to draw anything at all.
00:00:27.920 | But they can output code, and SVG is a kind of code.
00:00:30.960 | So that works.
00:00:32.800 | Fast forward to January.
00:00:34.720 | And January, we get DeepSeek again.
00:00:36.640 | DeepSeek strike back.
00:00:38.160 | This is what happened to NVIDIA's stock price
00:00:41.600 | when DeepSeek R1 came out.
00:00:44.320 | I think it was the 27th of January.
00:00:46.080 | This was DeepSeek's first big reasoning model release.
00:00:48.600 | Again, open weights.
00:00:49.760 | They put it out to the world.
00:00:51.120 | The Chinese labs were not supposed to be able to do this.
00:00:53.680 | We have trading restrictions on the best GPUs
00:00:56.040 | to stop them getting their hands on them.
00:00:57.720 | Turns out they'd figured out the tricks.
00:00:58.920 | They'd figured out the efficiencies.
00:01:00.480 | And yeah, the market kind of panicked.
00:01:01.960 | And I believe this is a world record
00:01:03.880 | for the most a company has dropped in a single day.
00:01:06.880 | That's a pretty frickin' good pelican.
00:01:08.960 | I mean, the bicycle's gone a bit sort of cyberpunk.
00:01:11.520 | But we are getting somewhere, right?
00:01:14.320 | And that pelican cost me like 4 and 1/2 cents.
00:01:16.840 | So very exciting news on the pelican benchmark front
00:01:19.880 | with Gemini 2.5 Pro.
00:01:21.680 | Also that month, I've got to throw a mention out to this.
00:01:24.920 | OpenAI launched their GP--
00:01:27.640 | Another one, this came out of the Claude 4 system cards.
00:01:31.680 | Claude 4 will rat you out to the feds.
00:01:34.240 | If you expose it to evidence of malfeasance in your company,
00:01:38.120 | and you tell it it should act ethically,
00:01:40.040 | and you give it the ability to send email, it'll rat you out.
00:01:42.840 | Blink and you miss it, they're on to me.
00:01:45.800 | They found out to buy my pelican.
00:01:47.120 | That was in the Google I/O keynote.
00:01:48.640 | I'll have to switch to something else.
00:01:49.880 | Thank you very much.
00:01:51.180 | I'm Simon Wilson, simonwilson.net, and that's my tool.
00:01:54.960 | Thank you.