back to index

GPT- 4.5 Gossip Crushed But a 100T Transformer Model Coming? Plus ByteDance + the Mixtral Price Drop


Whisper Transcript | Transcript Only Page

00:00:00.000 | With everyone listening to fruits and flowers these days on Twitter/X, it was hard to ignore
00:00:06.460 | the GPT 4.5 rumours, especially with those hallucinations in the model itself which I
00:00:11.800 | too experienced.
00:00:12.800 | But now, with denials from 3 OpenAI employees, I was relieved to be able to focus instead
00:00:19.160 | on the real news.
00:00:20.640 | This video is going to cover Etched In Transformers, the Mixed Trial Price Spiral, Mid-Journey
00:00:25.840 | V6, and the ByteDance saga with a guest appearance from none other than Sebastian Bubek.
00:00:31.920 | He's one of the authors of Sparks of AGI and the PHY series of models and I interviewed
00:00:36.760 | him less than 36 hours ago.
00:00:39.000 | But first, let's get to those GPT 4.5 denials.
00:00:41.720 | Here's Will Depew, a senior member of OpenAI.
00:00:45.120 | He was asked, "GPT 4.5 Turbo Discovery is legit or no?"
00:00:49.320 | He said, "No, it's a very weird and oddly consistent hallucination."
00:00:53.960 | Interestingly, someone else then speculated, "I wonder if putting UR GPT 4.5 Turbo in
00:00:58.280 | the system prompt improves performance?"
00:01:00.400 | And he said, "Quite possible.
00:01:02.240 | That would be quite funny if they've done that even though it's not GPT 4.5."
00:01:06.640 | By the way, I don't think they have.
00:01:08.240 | Next, we have none other than Sam Altman.
00:01:10.520 | He was asked, strangely by the same guy, "GPT 4.5 leak legit or no?"
00:01:16.360 | And he does seem to have an in with OpenAI because he got a response from Sam Altman
00:01:20.560 | saying, "Nah."
00:01:21.560 | I found that quite funny, all the hype bubble burst in a single three-letter word.
00:01:26.360 | But just in case there was any doubt, we then get this from Roon.
00:01:30.280 | He is an OpenAI employee but somewhat undercover.
00:01:33.120 | Anyway, he said, "You guys need to develop more resistance to crazy AI hype bros.
00:01:38.160 | There's no 4.5 and if there was, it wouldn't be released silently and if it was released
00:01:43.120 | silently, you wouldn't have the API string self-docs as 4.5."
00:01:47.440 | And another OpenAI employee replied, "The saddest thing about all of this is that people's
00:01:51.180 | expectations for a GPT 4.5 are so low.
00:01:54.160 | I have a suite of questions that I use myself to test every new model.
00:01:58.080 | Their questions mainly focus on mathematics and I tested GPT 4 during the height of the
00:02:02.160 | hype bubble.
00:02:03.160 | And no, nothing too significant had changed and that's why I agree with Adrian Ecovets.
00:02:07.320 | I think we'd notice more of a difference with GPT 4.5."
00:02:10.560 | But moving on to real news, what is the Etch Transformer that I mentioned?
00:02:14.440 | Well, there is very little publicly available information on this, but I've dug up what
00:02:18.400 | I could find.
00:02:19.400 | Essentially, this new company, Etch.ai, claim to have the world's first transformer supercomputer.
00:02:25.000 | One designed from the ground up to run transformers.
00:02:27.960 | The transformer architecture is of course used in large language models, but it's
00:02:31.520 | also used in computer vision and in audio and multimodal processing.
00:02:35.560 | Anyway, this company claims to have burned in the transformer architecture onto a chip.
00:02:40.600 | As they say, it's transformers etched into silicon.
00:02:43.920 | Their custom chip, codename Sohu, which I'll get to in a moment, as you can see, massively
00:02:48.040 | outperforms, apparently, NVIDIA's H100 on tokens per second inference.
00:02:53.000 | Translated, that would allow real-time interaction.
00:02:55.160 | What I'm going to do at this point, just before continuing with the technical specs,
00:02:58.400 | is I'm going to give you a bit of background from an article that I think hardly anyone
00:03:02.080 | else has read.
00:03:03.080 | The article describes a pair of 21-year-old Harvard dropouts raising multiple millions
00:03:09.240 | to design an AI accelerator chip dedicated to large language model acceleration.
00:03:13.920 | Basically, a company betting everything on transformers and large language models.
00:03:18.520 | This article came out in May and gets especially interesting towards the middle.
00:03:22.000 | The two co-founders, seen below, decided to start a chip company to design a more efficient
00:03:26.800 | inference architecture for large language models.
00:03:29.320 | Remember, inference isn't about the training of a large language model, it's about generating
00:03:33.220 | its outputs.
00:03:34.220 | Anyway, here's the key quote.
00:03:35.600 | You can't get the kind of improvements we're getting by being generalized.
00:03:39.280 | You'll see more of the improvements in a moment, but they go on.
00:03:42.040 | You have to bet hard on a single architecture.
00:03:45.220 | Not just on general AI, but on something more specific.
00:03:47.720 | We think eventually NVIDIA will do this.
00:03:49.860 | We think the opportunity is too big to ignore.
00:03:52.400 | Later on, we learn that they're betting everything on a particular architecture.
00:03:56.000 | There are others being tested like Mamba, but they're betting everything on transformers.
00:04:00.300 | The article goes on, "The rapid evolution of workloads in the AI space could spell disaster
00:04:04.960 | if etched AI specializes too much."
00:04:07.200 | And the co-founder admitted as much, "That's a real risk and I think it's turning off
00:04:10.760 | a lot of other people from going down this route, but transformers aren't changing,"
00:04:14.600 | he said.
00:04:15.600 | But if the bet pays off, what does that mean?
00:04:17.500 | Well, first of all, it could mean 140 times the throughput per dollar.
00:04:21.600 | Think real-time interactions with models at a very cheap price.
00:04:24.880 | That's as compared to the NVIDIA H100.
00:04:27.480 | The website gives the example of real-time voice agents where models built on this etched
00:04:31.680 | in architecture could ingest thousands of words in milliseconds.
00:04:35.640 | None of those awkward pauses between asking your model something using voice.
00:04:39.520 | And of course, generating many more outputs much more quickly means that you can compare
00:04:43.640 | among them and pick the best, just like we do with self-consistency and just like Alphacode
00:04:48.520 | 2 did.
00:04:49.520 | And if you don't know about Alphacode 2, do check out my video on Gemini.
00:04:53.040 | Just quickly though, what might it mean to burn the architecture onto a chip?
00:04:56.680 | Well, here's the somewhat simplified version.
00:04:59.160 | Transformers are typically run on general purpose GPUs, which could be used for other
00:05:03.280 | things.
00:05:04.280 | And those GPUs are then optimized through software to run LLMs.
00:05:08.160 | The etched supercomputers, in contrast, would potentially be dedicated hardware designed
00:05:13.220 | from the ground up to run transformers.
00:05:15.320 | Essentially, by etching the transformer architecture directly into the silicon, every transistor
00:05:19.800 | could be optimized specifically for transformer computations.
00:05:23.420 | Like for example, matrix multiplication, which is the core calculation going on in large
00:05:27.840 | language models.
00:05:28.840 | So yes, alas, that poem on pomegranates was the result of a ton of multiplications and
00:05:34.880 | additions.
00:05:35.880 | The plan apparently is to fully open source the software stack.
00:05:39.240 | And open sourcing the software stack would be a strategic way, in my view, to draw people
00:05:44.560 | in to depend on that new hardware.
00:05:46.900 | And specializing hardware like this could unlock 100x gains if architectures don't
00:05:52.240 | radically change in the next year.
00:05:53.960 | Look at the number that they promise.
00:05:55.840 | It's expansible up to 100 trillion parameter models.
00:05:59.720 | That would be about 60 times the size of GPT-4.
00:06:03.560 | Or to kind of summarize everything, it's designed from the ground up for GPT-5.
00:06:08.460 | In terms of dates, it's potentially not that long away with the chip being slated
00:06:12.780 | to be available in 2024.
00:06:14.800 | They say that they're doing a Series A at the beginning of next year, seeking funds.
00:06:19.240 | They say most investors are skeptical and rightfully so because what they see is a pair
00:06:23.720 | of undergrads trying to tackle the semiconductor industry.
00:06:27.680 | So if all of this is next year, why am I covering it now?
00:06:30.600 | Well, firstly, because if it works, it would be crazy transformative for generative AI.
00:06:36.040 | And secondly, let's be honest, if this is the thing that changes the industry, I want
00:06:40.000 | you to be able to say that you heard it here first.
00:06:42.400 | But it's now time for something that's available today.
00:06:45.400 | That's Mistrial's mixture of experts.
00:06:47.560 | That's why it's called a Mistrial 8x7 billion parameter model.
00:06:51.480 | It's open sourced and matches or beats GPT-3.5 not only in benchmarks and not just in leaderboards
00:06:58.120 | matching GPT-3.5 and beating Gemini Pro, but also more significantly in price.
00:07:03.620 | A week ago when the Mistrial model was announced, it was $2 per 1 million tokens.
00:07:09.040 | Hours later, Together Compute dropped the pricing by 70%.
00:07:12.560 | Days later, it was cut 50% further by Abacus AI to $0.30 per 1 million tokens.
00:07:18.800 | Then just three days ago, Deep Infra went to $0.27 per million tokens.
00:07:23.680 | And you can probably see where I'm going here.
00:07:25.560 | What happened?
00:07:26.560 | Well, the current one provider was offering Mistrial for free for both input and output.
00:07:32.160 | Models are not only getting more performance at a dramatic rate, they're getting cheaper
00:07:36.440 | at an even more dramatic rate.
00:07:38.160 | Combine the two and it really makes you wonder where we'll be at the end of 2024 for intelligence
00:07:43.260 | per dollar.
00:07:44.260 | How about, for example, GPT-4 level reasoning on a 13 billion parameter model?
00:07:49.280 | Here's what Sebastian Bubek, one of the lead authors of Sparks of AGI and the PHY series
00:07:54.480 | of models, told me just 36 hours ago.
00:07:57.440 | And it's completely an open question at the moment what kind of capabilities we're going
00:08:01.240 | to be able to achieve.
00:08:02.360 | We don't know.
00:08:03.360 | But from what I'm seeing in terms of the performance we're able to extract at 1 billion, at 3 billion,
00:08:08.720 | and what I know of the big models like GPT-4, I think there is room, at least for the reasoning
00:08:14.180 | part, to be enabled already at 13 billion parameters.
00:08:18.680 | He's so driven in this mission to solve reasoning that his eyes aren't even focused on getting
00:08:23.320 | LLMs onto a phone.
00:08:25.000 | That's despite recent news that we can fit up to 10 billion parameters on device as Qualcomm
00:08:30.160 | have boasted.
00:08:31.160 | I thought that you were deliberately targeting around 10 billion to get on a phone, but it
00:08:35.080 | seems like not.
00:08:36.080 | No, not necessarily.
00:08:37.080 | I mean, the bar of being on the phone, I mean, it's a bar that I like.
00:08:41.520 | But for me personally, it's really more about the scientific quest of what are the minimal
00:08:47.080 | ingredients that are needed to achieve a level of intelligence which is similar to something
00:08:52.200 | like GPT-4.
00:08:53.200 | That's a real question to me.
00:08:54.460 | If you're interested in learning more about the PHY2 series of models, I've done a video
00:08:58.200 | on it on my channel.
00:08:59.520 | And if you're interested in the full Juicy interview, it's on AI Insiders.
00:09:02.880 | You also get podcasts, exclusive tutorials, and a personal message from me.
00:09:07.380 | Of course, I appreciate that not everyone can afford the $24 something I think it works
00:09:11.680 | out at for the annual sub, so I deeply appreciate you just watching these videos and leaving
00:09:16.480 | comments and likes.
00:09:17.680 | And of course, all the research I do for AI Insiders massively benefits my analysis for
00:09:22.840 | the main AI Explained channel.
00:09:24.800 | Two more fun bits though before I go.
00:09:26.800 | First, ByteDance, a multi-hundred billion dollar company, is apparently secretly using
00:09:32.320 | OpenAI's tech to build a competitor.
00:09:34.740 | They really just don't want to get caught.
00:09:36.920 | Of course, using the outputs of models like GPT-4 to improve other models has long been
00:09:42.440 | known about.
00:09:43.440 | It's a key technique behind the Orca series of models and indeed the PHY series of models.
00:09:47.160 | Both of those were, of course, in partnership with Microsoft.
00:09:50.120 | But it's actually against the terms and conditions for other companies to do that.
00:09:54.000 | But as the headline says, the frenzied race to win in generative AI means that even the
00:09:58.520 | biggest players are cutting corners.
00:10:00.520 | ByteDance, by the way, are behind TikTok and as the article restates, it's in direct
00:10:04.800 | violation of OpenAI's Terms of Service, which state that model output can't be used
00:10:09.840 | to "develop any AI models that compete with our products and services."
00:10:14.520 | Nevertheless, internal ByteDance documents confirm that the OpenAI API has been relied
00:10:19.820 | on to develop its foundational LLM, codenamed Project Seed, apparently during nearly every
00:10:25.280 | phase of development, including for training and evaluating the model.
00:10:29.600 | Employees apparently are well aware of the implications and they had plans to whitewash
00:10:34.280 | the evidence.
00:10:35.280 | In response, about 36 hours ago, OpenAI banned ByteDance from ChatGPT due to, and here's
00:10:41.360 | some irony, possible data theft.
00:10:43.480 | I invite you to let me know in the comments what you think of all of that.
00:10:47.240 | Now, I did warn you at the start about Twitter rumors, but I'm going to make one slight
00:10:50.780 | exception for two reasons.
00:10:52.480 | First, because it was more of a statement and second, because it comes from the head
00:10:56.360 | of research at ByteDance.
00:10:58.120 | His name is Kuan Kuan Gu and he said this recently, "He's uncertain about GPT-5,
00:11:02.360 | but a super strong model, more powerful than Gemini, is expected to arrive any time now."
00:11:07.520 | You might have thought he meant GPT 4.5, but no.
00:11:10.240 | He was asked open source and he said, "Open model waits."
00:11:13.720 | Is he referring to Project Seed here?
00:11:15.920 | Well, time will tell.
00:11:17.120 | And when asked, "When open source catches GPT-4, do you think OpenAI will just wait
00:11:21.160 | for that to happen?"
00:11:22.240 | He said, "We don't settle for catching up with GPT-4.
00:11:25.460 | We outpace GPT-5."
00:11:26.800 | I'm not holding my breath on Project Seed, but let me know what you think.
00:11:31.280 | Let me end with this preview images for Midjourney V6.
00:11:36.080 | Just as I observed with Imogen 2, the real breakthrough for me seems to be the added
00:11:40.920 | level of photo realism.
00:11:42.880 | Of course, I'd love to hear what you think.
00:11:45.080 | There is still a slight smoothness to them, but when you upscale them using Magnific,
00:11:50.720 | then the woman, if not the text in the background, looks exceptionally realistic.
00:11:56.060 | As always, let me know what you think.
00:11:57.760 | Thank you so much for watching to the end and have a wonderful day.
00:12:01.680 | (upbeat music)