back to indexGPT- 4.5 Gossip Crushed But a 100T Transformer Model Coming? Plus ByteDance + the Mixtral Price Drop
00:00:00.000 |
With everyone listening to fruits and flowers these days on Twitter/X, it was hard to ignore 00:00:06.460 |
the GPT 4.5 rumours, especially with those hallucinations in the model itself which I 00:00:12.800 |
But now, with denials from 3 OpenAI employees, I was relieved to be able to focus instead 00:00:20.640 |
This video is going to cover Etched In Transformers, the Mixed Trial Price Spiral, Mid-Journey 00:00:25.840 |
V6, and the ByteDance saga with a guest appearance from none other than Sebastian Bubek. 00:00:31.920 |
He's one of the authors of Sparks of AGI and the PHY series of models and I interviewed 00:00:39.000 |
But first, let's get to those GPT 4.5 denials. 00:00:41.720 |
Here's Will Depew, a senior member of OpenAI. 00:00:45.120 |
He was asked, "GPT 4.5 Turbo Discovery is legit or no?" 00:00:49.320 |
He said, "No, it's a very weird and oddly consistent hallucination." 00:00:53.960 |
Interestingly, someone else then speculated, "I wonder if putting UR GPT 4.5 Turbo in 00:01:02.240 |
That would be quite funny if they've done that even though it's not GPT 4.5." 00:01:10.520 |
He was asked, strangely by the same guy, "GPT 4.5 leak legit or no?" 00:01:16.360 |
And he does seem to have an in with OpenAI because he got a response from Sam Altman 00:01:21.560 |
I found that quite funny, all the hype bubble burst in a single three-letter word. 00:01:26.360 |
But just in case there was any doubt, we then get this from Roon. 00:01:30.280 |
He is an OpenAI employee but somewhat undercover. 00:01:33.120 |
Anyway, he said, "You guys need to develop more resistance to crazy AI hype bros. 00:01:38.160 |
There's no 4.5 and if there was, it wouldn't be released silently and if it was released 00:01:43.120 |
silently, you wouldn't have the API string self-docs as 4.5." 00:01:47.440 |
And another OpenAI employee replied, "The saddest thing about all of this is that people's 00:01:54.160 |
I have a suite of questions that I use myself to test every new model. 00:01:58.080 |
Their questions mainly focus on mathematics and I tested GPT 4 during the height of the 00:02:03.160 |
And no, nothing too significant had changed and that's why I agree with Adrian Ecovets. 00:02:07.320 |
I think we'd notice more of a difference with GPT 4.5." 00:02:10.560 |
But moving on to real news, what is the Etch Transformer that I mentioned? 00:02:14.440 |
Well, there is very little publicly available information on this, but I've dug up what 00:02:19.400 |
Essentially, this new company, Etch.ai, claim to have the world's first transformer supercomputer. 00:02:25.000 |
One designed from the ground up to run transformers. 00:02:27.960 |
The transformer architecture is of course used in large language models, but it's 00:02:31.520 |
also used in computer vision and in audio and multimodal processing. 00:02:35.560 |
Anyway, this company claims to have burned in the transformer architecture onto a chip. 00:02:40.600 |
As they say, it's transformers etched into silicon. 00:02:43.920 |
Their custom chip, codename Sohu, which I'll get to in a moment, as you can see, massively 00:02:48.040 |
outperforms, apparently, NVIDIA's H100 on tokens per second inference. 00:02:53.000 |
Translated, that would allow real-time interaction. 00:02:55.160 |
What I'm going to do at this point, just before continuing with the technical specs, 00:02:58.400 |
is I'm going to give you a bit of background from an article that I think hardly anyone 00:03:03.080 |
The article describes a pair of 21-year-old Harvard dropouts raising multiple millions 00:03:09.240 |
to design an AI accelerator chip dedicated to large language model acceleration. 00:03:13.920 |
Basically, a company betting everything on transformers and large language models. 00:03:18.520 |
This article came out in May and gets especially interesting towards the middle. 00:03:22.000 |
The two co-founders, seen below, decided to start a chip company to design a more efficient 00:03:26.800 |
inference architecture for large language models. 00:03:29.320 |
Remember, inference isn't about the training of a large language model, it's about generating 00:03:35.600 |
You can't get the kind of improvements we're getting by being generalized. 00:03:39.280 |
You'll see more of the improvements in a moment, but they go on. 00:03:42.040 |
You have to bet hard on a single architecture. 00:03:45.220 |
Not just on general AI, but on something more specific. 00:03:49.860 |
We think the opportunity is too big to ignore. 00:03:52.400 |
Later on, we learn that they're betting everything on a particular architecture. 00:03:56.000 |
There are others being tested like Mamba, but they're betting everything on transformers. 00:04:00.300 |
The article goes on, "The rapid evolution of workloads in the AI space could spell disaster 00:04:07.200 |
And the co-founder admitted as much, "That's a real risk and I think it's turning off 00:04:10.760 |
a lot of other people from going down this route, but transformers aren't changing," 00:04:15.600 |
But if the bet pays off, what does that mean? 00:04:17.500 |
Well, first of all, it could mean 140 times the throughput per dollar. 00:04:21.600 |
Think real-time interactions with models at a very cheap price. 00:04:27.480 |
The website gives the example of real-time voice agents where models built on this etched 00:04:31.680 |
in architecture could ingest thousands of words in milliseconds. 00:04:35.640 |
None of those awkward pauses between asking your model something using voice. 00:04:39.520 |
And of course, generating many more outputs much more quickly means that you can compare 00:04:43.640 |
among them and pick the best, just like we do with self-consistency and just like Alphacode 00:04:49.520 |
And if you don't know about Alphacode 2, do check out my video on Gemini. 00:04:53.040 |
Just quickly though, what might it mean to burn the architecture onto a chip? 00:04:56.680 |
Well, here's the somewhat simplified version. 00:04:59.160 |
Transformers are typically run on general purpose GPUs, which could be used for other 00:05:04.280 |
And those GPUs are then optimized through software to run LLMs. 00:05:08.160 |
The etched supercomputers, in contrast, would potentially be dedicated hardware designed 00:05:15.320 |
Essentially, by etching the transformer architecture directly into the silicon, every transistor 00:05:19.800 |
could be optimized specifically for transformer computations. 00:05:23.420 |
Like for example, matrix multiplication, which is the core calculation going on in large 00:05:28.840 |
So yes, alas, that poem on pomegranates was the result of a ton of multiplications and 00:05:35.880 |
The plan apparently is to fully open source the software stack. 00:05:39.240 |
And open sourcing the software stack would be a strategic way, in my view, to draw people 00:05:46.900 |
And specializing hardware like this could unlock 100x gains if architectures don't 00:05:55.840 |
It's expansible up to 100 trillion parameter models. 00:05:59.720 |
That would be about 60 times the size of GPT-4. 00:06:03.560 |
Or to kind of summarize everything, it's designed from the ground up for GPT-5. 00:06:08.460 |
In terms of dates, it's potentially not that long away with the chip being slated 00:06:14.800 |
They say that they're doing a Series A at the beginning of next year, seeking funds. 00:06:19.240 |
They say most investors are skeptical and rightfully so because what they see is a pair 00:06:23.720 |
of undergrads trying to tackle the semiconductor industry. 00:06:27.680 |
So if all of this is next year, why am I covering it now? 00:06:30.600 |
Well, firstly, because if it works, it would be crazy transformative for generative AI. 00:06:36.040 |
And secondly, let's be honest, if this is the thing that changes the industry, I want 00:06:40.000 |
you to be able to say that you heard it here first. 00:06:42.400 |
But it's now time for something that's available today. 00:06:47.560 |
That's why it's called a Mistrial 8x7 billion parameter model. 00:06:51.480 |
It's open sourced and matches or beats GPT-3.5 not only in benchmarks and not just in leaderboards 00:06:58.120 |
matching GPT-3.5 and beating Gemini Pro, but also more significantly in price. 00:07:03.620 |
A week ago when the Mistrial model was announced, it was $2 per 1 million tokens. 00:07:09.040 |
Hours later, Together Compute dropped the pricing by 70%. 00:07:12.560 |
Days later, it was cut 50% further by Abacus AI to $0.30 per 1 million tokens. 00:07:18.800 |
Then just three days ago, Deep Infra went to $0.27 per million tokens. 00:07:23.680 |
And you can probably see where I'm going here. 00:07:26.560 |
Well, the current one provider was offering Mistrial for free for both input and output. 00:07:32.160 |
Models are not only getting more performance at a dramatic rate, they're getting cheaper 00:07:38.160 |
Combine the two and it really makes you wonder where we'll be at the end of 2024 for intelligence 00:07:44.260 |
How about, for example, GPT-4 level reasoning on a 13 billion parameter model? 00:07:49.280 |
Here's what Sebastian Bubek, one of the lead authors of Sparks of AGI and the PHY series 00:07:57.440 |
And it's completely an open question at the moment what kind of capabilities we're going 00:08:03.360 |
But from what I'm seeing in terms of the performance we're able to extract at 1 billion, at 3 billion, 00:08:08.720 |
and what I know of the big models like GPT-4, I think there is room, at least for the reasoning 00:08:14.180 |
part, to be enabled already at 13 billion parameters. 00:08:18.680 |
He's so driven in this mission to solve reasoning that his eyes aren't even focused on getting 00:08:25.000 |
That's despite recent news that we can fit up to 10 billion parameters on device as Qualcomm 00:08:31.160 |
I thought that you were deliberately targeting around 10 billion to get on a phone, but it 00:08:37.080 |
I mean, the bar of being on the phone, I mean, it's a bar that I like. 00:08:41.520 |
But for me personally, it's really more about the scientific quest of what are the minimal 00:08:47.080 |
ingredients that are needed to achieve a level of intelligence which is similar to something 00:08:54.460 |
If you're interested in learning more about the PHY2 series of models, I've done a video 00:08:59.520 |
And if you're interested in the full Juicy interview, it's on AI Insiders. 00:09:02.880 |
You also get podcasts, exclusive tutorials, and a personal message from me. 00:09:07.380 |
Of course, I appreciate that not everyone can afford the $24 something I think it works 00:09:11.680 |
out at for the annual sub, so I deeply appreciate you just watching these videos and leaving 00:09:17.680 |
And of course, all the research I do for AI Insiders massively benefits my analysis for 00:09:26.800 |
First, ByteDance, a multi-hundred billion dollar company, is apparently secretly using 00:09:36.920 |
Of course, using the outputs of models like GPT-4 to improve other models has long been 00:09:43.440 |
It's a key technique behind the Orca series of models and indeed the PHY series of models. 00:09:47.160 |
Both of those were, of course, in partnership with Microsoft. 00:09:50.120 |
But it's actually against the terms and conditions for other companies to do that. 00:09:54.000 |
But as the headline says, the frenzied race to win in generative AI means that even the 00:10:00.520 |
ByteDance, by the way, are behind TikTok and as the article restates, it's in direct 00:10:04.800 |
violation of OpenAI's Terms of Service, which state that model output can't be used 00:10:09.840 |
to "develop any AI models that compete with our products and services." 00:10:14.520 |
Nevertheless, internal ByteDance documents confirm that the OpenAI API has been relied 00:10:19.820 |
on to develop its foundational LLM, codenamed Project Seed, apparently during nearly every 00:10:25.280 |
phase of development, including for training and evaluating the model. 00:10:29.600 |
Employees apparently are well aware of the implications and they had plans to whitewash 00:10:35.280 |
In response, about 36 hours ago, OpenAI banned ByteDance from ChatGPT due to, and here's 00:10:43.480 |
I invite you to let me know in the comments what you think of all of that. 00:10:47.240 |
Now, I did warn you at the start about Twitter rumors, but I'm going to make one slight 00:10:52.480 |
First, because it was more of a statement and second, because it comes from the head 00:10:58.120 |
His name is Kuan Kuan Gu and he said this recently, "He's uncertain about GPT-5, 00:11:02.360 |
but a super strong model, more powerful than Gemini, is expected to arrive any time now." 00:11:07.520 |
You might have thought he meant GPT 4.5, but no. 00:11:10.240 |
He was asked open source and he said, "Open model waits." 00:11:17.120 |
And when asked, "When open source catches GPT-4, do you think OpenAI will just wait 00:11:22.240 |
He said, "We don't settle for catching up with GPT-4. 00:11:26.800 |
I'm not holding my breath on Project Seed, but let me know what you think. 00:11:31.280 |
Let me end with this preview images for Midjourney V6. 00:11:36.080 |
Just as I observed with Imogen 2, the real breakthrough for me seems to be the added 00:11:45.080 |
There is still a slight smoothness to them, but when you upscale them using Magnific, 00:11:50.720 |
then the woman, if not the text in the background, looks exceptionally realistic. 00:11:57.760 |
Thank you so much for watching to the end and have a wonderful day.