Back to Index

GPT- 4.5 Gossip Crushed But a 100T Transformer Model Coming? Plus ByteDance + the Mixtral Price Drop


Transcript

With everyone listening to fruits and flowers these days on Twitter/X, it was hard to ignore the GPT 4.5 rumours, especially with those hallucinations in the model itself which I too experienced. But now, with denials from 3 OpenAI employees, I was relieved to be able to focus instead on the real news.

This video is going to cover Etched In Transformers, the Mixed Trial Price Spiral, Mid-Journey V6, and the ByteDance saga with a guest appearance from none other than Sebastian Bubek. He's one of the authors of Sparks of AGI and the PHY series of models and I interviewed him less than 36 hours ago.

But first, let's get to those GPT 4.5 denials. Here's Will Depew, a senior member of OpenAI. He was asked, "GPT 4.5 Turbo Discovery is legit or no?" He said, "No, it's a very weird and oddly consistent hallucination." Interestingly, someone else then speculated, "I wonder if putting UR GPT 4.5 Turbo in the system prompt improves performance?" And he said, "Quite possible.

That would be quite funny if they've done that even though it's not GPT 4.5." By the way, I don't think they have. Next, we have none other than Sam Altman. He was asked, strangely by the same guy, "GPT 4.5 leak legit or no?" And he does seem to have an in with OpenAI because he got a response from Sam Altman saying, "Nah." I found that quite funny, all the hype bubble burst in a single three-letter word.

But just in case there was any doubt, we then get this from Roon. He is an OpenAI employee but somewhat undercover. Anyway, he said, "You guys need to develop more resistance to crazy AI hype bros. There's no 4.5 and if there was, it wouldn't be released silently and if it was released silently, you wouldn't have the API string self-docs as 4.5." And another OpenAI employee replied, "The saddest thing about all of this is that people's expectations for a GPT 4.5 are so low.

I have a suite of questions that I use myself to test every new model. Their questions mainly focus on mathematics and I tested GPT 4 during the height of the hype bubble. And no, nothing too significant had changed and that's why I agree with Adrian Ecovets. I think we'd notice more of a difference with GPT 4.5." But moving on to real news, what is the Etch Transformer that I mentioned?

Well, there is very little publicly available information on this, but I've dug up what I could find. Essentially, this new company, Etch.ai, claim to have the world's first transformer supercomputer. One designed from the ground up to run transformers. The transformer architecture is of course used in large language models, but it's also used in computer vision and in audio and multimodal processing.

Anyway, this company claims to have burned in the transformer architecture onto a chip. As they say, it's transformers etched into silicon. Their custom chip, codename Sohu, which I'll get to in a moment, as you can see, massively outperforms, apparently, NVIDIA's H100 on tokens per second inference. Translated, that would allow real-time interaction.

What I'm going to do at this point, just before continuing with the technical specs, is I'm going to give you a bit of background from an article that I think hardly anyone else has read. The article describes a pair of 21-year-old Harvard dropouts raising multiple millions to design an AI accelerator chip dedicated to large language model acceleration.

Basically, a company betting everything on transformers and large language models. This article came out in May and gets especially interesting towards the middle. The two co-founders, seen below, decided to start a chip company to design a more efficient inference architecture for large language models. Remember, inference isn't about the training of a large language model, it's about generating its outputs.

Anyway, here's the key quote. You can't get the kind of improvements we're getting by being generalized. You'll see more of the improvements in a moment, but they go on. You have to bet hard on a single architecture. Not just on general AI, but on something more specific. We think eventually NVIDIA will do this.

We think the opportunity is too big to ignore. Later on, we learn that they're betting everything on a particular architecture. There are others being tested like Mamba, but they're betting everything on transformers. The article goes on, "The rapid evolution of workloads in the AI space could spell disaster if etched AI specializes too much." And the co-founder admitted as much, "That's a real risk and I think it's turning off a lot of other people from going down this route, but transformers aren't changing," he said.

But if the bet pays off, what does that mean? Well, first of all, it could mean 140 times the throughput per dollar. Think real-time interactions with models at a very cheap price. That's as compared to the NVIDIA H100. The website gives the example of real-time voice agents where models built on this etched in architecture could ingest thousands of words in milliseconds.

None of those awkward pauses between asking your model something using voice. And of course, generating many more outputs much more quickly means that you can compare among them and pick the best, just like we do with self-consistency and just like Alphacode 2 did. And if you don't know about Alphacode 2, do check out my video on Gemini.

Just quickly though, what might it mean to burn the architecture onto a chip? Well, here's the somewhat simplified version. Transformers are typically run on general purpose GPUs, which could be used for other things. And those GPUs are then optimized through software to run LLMs. The etched supercomputers, in contrast, would potentially be dedicated hardware designed from the ground up to run transformers.

Essentially, by etching the transformer architecture directly into the silicon, every transistor could be optimized specifically for transformer computations. Like for example, matrix multiplication, which is the core calculation going on in large language models. So yes, alas, that poem on pomegranates was the result of a ton of multiplications and additions.

The plan apparently is to fully open source the software stack. And open sourcing the software stack would be a strategic way, in my view, to draw people in to depend on that new hardware. And specializing hardware like this could unlock 100x gains if architectures don't radically change in the next year.

Look at the number that they promise. It's expansible up to 100 trillion parameter models. That would be about 60 times the size of GPT-4. Or to kind of summarize everything, it's designed from the ground up for GPT-5. In terms of dates, it's potentially not that long away with the chip being slated to be available in 2024.

They say that they're doing a Series A at the beginning of next year, seeking funds. They say most investors are skeptical and rightfully so because what they see is a pair of undergrads trying to tackle the semiconductor industry. So if all of this is next year, why am I covering it now?

Well, firstly, because if it works, it would be crazy transformative for generative AI. And secondly, let's be honest, if this is the thing that changes the industry, I want you to be able to say that you heard it here first. But it's now time for something that's available today.

That's Mistrial's mixture of experts. That's why it's called a Mistrial 8x7 billion parameter model. It's open sourced and matches or beats GPT-3.5 not only in benchmarks and not just in leaderboards matching GPT-3.5 and beating Gemini Pro, but also more significantly in price. A week ago when the Mistrial model was announced, it was $2 per 1 million tokens.

Hours later, Together Compute dropped the pricing by 70%. Days later, it was cut 50% further by Abacus AI to $0.30 per 1 million tokens. Then just three days ago, Deep Infra went to $0.27 per million tokens. And you can probably see where I'm going here. What happened? Well, the current one provider was offering Mistrial for free for both input and output.

Models are not only getting more performance at a dramatic rate, they're getting cheaper at an even more dramatic rate. Combine the two and it really makes you wonder where we'll be at the end of 2024 for intelligence per dollar. How about, for example, GPT-4 level reasoning on a 13 billion parameter model?

Here's what Sebastian Bubek, one of the lead authors of Sparks of AGI and the PHY series of models, told me just 36 hours ago. And it's completely an open question at the moment what kind of capabilities we're going to be able to achieve. We don't know. But from what I'm seeing in terms of the performance we're able to extract at 1 billion, at 3 billion, and what I know of the big models like GPT-4, I think there is room, at least for the reasoning part, to be enabled already at 13 billion parameters.

He's so driven in this mission to solve reasoning that his eyes aren't even focused on getting LLMs onto a phone. That's despite recent news that we can fit up to 10 billion parameters on device as Qualcomm have boasted. I thought that you were deliberately targeting around 10 billion to get on a phone, but it seems like not.

No, not necessarily. I mean, the bar of being on the phone, I mean, it's a bar that I like. But for me personally, it's really more about the scientific quest of what are the minimal ingredients that are needed to achieve a level of intelligence which is similar to something like GPT-4.

That's a real question to me. If you're interested in learning more about the PHY2 series of models, I've done a video on it on my channel. And if you're interested in the full Juicy interview, it's on AI Insiders. You also get podcasts, exclusive tutorials, and a personal message from me.

Of course, I appreciate that not everyone can afford the $24 something I think it works out at for the annual sub, so I deeply appreciate you just watching these videos and leaving comments and likes. And of course, all the research I do for AI Insiders massively benefits my analysis for the main AI Explained channel.

Two more fun bits though before I go. First, ByteDance, a multi-hundred billion dollar company, is apparently secretly using OpenAI's tech to build a competitor. They really just don't want to get caught. Of course, using the outputs of models like GPT-4 to improve other models has long been known about.

It's a key technique behind the Orca series of models and indeed the PHY series of models. Both of those were, of course, in partnership with Microsoft. But it's actually against the terms and conditions for other companies to do that. But as the headline says, the frenzied race to win in generative AI means that even the biggest players are cutting corners.

ByteDance, by the way, are behind TikTok and as the article restates, it's in direct violation of OpenAI's Terms of Service, which state that model output can't be used to "develop any AI models that compete with our products and services." Nevertheless, internal ByteDance documents confirm that the OpenAI API has been relied on to develop its foundational LLM, codenamed Project Seed, apparently during nearly every phase of development, including for training and evaluating the model.

Employees apparently are well aware of the implications and they had plans to whitewash the evidence. In response, about 36 hours ago, OpenAI banned ByteDance from ChatGPT due to, and here's some irony, possible data theft. I invite you to let me know in the comments what you think of all of that.

Now, I did warn you at the start about Twitter rumors, but I'm going to make one slight exception for two reasons. First, because it was more of a statement and second, because it comes from the head of research at ByteDance. His name is Kuan Kuan Gu and he said this recently, "He's uncertain about GPT-5, but a super strong model, more powerful than Gemini, is expected to arrive any time now." You might have thought he meant GPT 4.5, but no.

He was asked open source and he said, "Open model waits." Is he referring to Project Seed here? Well, time will tell. And when asked, "When open source catches GPT-4, do you think OpenAI will just wait for that to happen?" He said, "We don't settle for catching up with GPT-4.

We outpace GPT-5." I'm not holding my breath on Project Seed, but let me know what you think. Let me end with this preview images for Midjourney V6. Just as I observed with Imogen 2, the real breakthrough for me seems to be the added level of photo realism. Of course, I'd love to hear what you think.

There is still a slight smoothness to them, but when you upscale them using Magnific, then the woman, if not the text in the background, looks exceptionally realistic. As always, let me know what you think. Thank you so much for watching to the end and have a wonderful day. (upbeat music)