There are all of these benchmarks full of numbers. I don't like the numbers. There are the leaderboards. I'm kind of beginning to lose trust in the leaderboards as well. So for my own work, I've been leaning increasingly into my own little benchmark, which started as a joke and has actually turned into something that I've learned quite a lot.
And that's this. I prompt models with generate an SVG of a pelican riding a bicycle. I have good reasons for this. Firstly, these are not image models. These are text models. They shouldn't be able to draw anything at all. But they can output code, and SVG is a kind of code.
So that works. Fast forward to January. And January, we get DeepSeek again. DeepSeek strike back. This is what happened to NVIDIA's stock price when DeepSeek R1 came out. I think it was the 27th of January. This was DeepSeek's first big reasoning model release. Again, open weights. They put it out to the world.
The Chinese labs were not supposed to be able to do this. We have trading restrictions on the best GPUs to stop them getting their hands on them. Turns out they'd figured out the tricks. They'd figured out the efficiencies. And yeah, the market kind of panicked. And I believe this is a world record for the most a company has dropped in a single day.
That's a pretty frickin' good pelican. I mean, the bicycle's gone a bit sort of cyberpunk. But we are getting somewhere, right? And that pelican cost me like 4 and 1/2 cents. So very exciting news on the pelican benchmark front with Gemini 2.5 Pro. Also that month, I've got to throw a mention out to this.
OpenAI launched their GP-- Another one, this came out of the Claude 4 system cards. Claude 4 will rat you out to the feds. If you expose it to evidence of malfeasance in your company, and you tell it it should act ethically, and you give it the ability to send email, it'll rat you out.
Blink and you miss it, they're on to me. They found out to buy my pelican. That was in the Google I/O keynote. I'll have to switch to something else. Thank you very much. I'm Simon Wilson, simonwilson.net, and that's my tool. Thank you.