back to indexUdio, the Mysterious GPT Update, and Infinite Attention
00:00:00.000 |
It's been a strange 48 hours in the world of AI with releases like Oudio that have reminded 00:00:06.440 |
millions of people what AI is capable of and models that can pay you infinite attention. 00:00:13.080 |
But we also got befuddling updates from OpenAI that suggest that not all is smooth sailing. 00:00:18.960 |
I'll start, of course, with the new model on Oudio.com and how musicians are reacting. 00:00:25.880 |
Then cover the perplexing manner of the release of GPT-4 Turbo with Vision and touch on a 00:00:31.560 |
fascinating new Infinite Context paper from Google. 00:00:35.360 |
But now let's hear three 20-second extracts from Oudio to give you an inkling, if you 00:00:41.120 |
haven't heard it already, of what it's capable of. 00:01:06.840 |
And now for some quite frankly amazing AI-generated classical music. 00:01:25.640 |
And next, something I'm going to bleep a little bit, but represents the reaction of Uncharted 00:01:31.320 |
Labs, who are behind Oudio, to their servers going down. 00:01:44.300 |
And of course, I have been playing about with Oudio like almost everyone has, and did you 00:02:21.820 |
Now I'm not sure if this guy is talking about me, but I thought I'd let you know that this 00:02:27.460 |
And how about a quick, direct comparison between Oudio and Suno V3? 00:02:41.220 |
Now, I prefer Oudio there, but you do sometimes get complete gobbledygook. 00:03:10.100 |
Now Will.i.am calls Oudio the best tech on earth, and Uncharted Labs, which is the company 00:03:22.420 |
behind Oudio, he says is really aiming to be an ally for creatives and artists. 00:03:28.420 |
Now it should of course be noted that Will.i.am is an investor in Oudio, but again they repeat 00:03:33.500 |
that Oudio is about building AI tools to enable the next generation of music creators. 00:03:39.460 |
Now of course everyone has their own opinion, but let's now get a taste of the reaction 00:03:44.880 |
One says it's pretty scary thinking what is going to exist a year or two from now, and 00:03:49.980 |
what it means for musicians, listeners, and the industry as a whole. 00:03:53.660 |
The top comment says I would buy a band t-shirt, but never buy a shirt for an AI, which makes 00:04:02.660 |
I am a music professional, producer/composer. 00:04:05.740 |
This is highly advanced, and I thought this stuff was years away. 00:04:10.140 |
And one more, I've already gone full circle with it, past the confusion and devastation, 00:04:15.180 |
and now I'm just curious what Gregorian chant would sound like with, I can't even pronounce 00:04:21.660 |
So definitely a mixed reaction from musicians. 00:04:24.860 |
Personally, I don't think it's too much of an exaggeration to call this the Chachapiti 00:04:31.660 |
Zuno often has a slight tinniness that gives it away for those not following AI, but with 00:04:37.340 |
Udio, I think you could convince many people that they're listening to human music, just 00:04:42.620 |
like Chachapiti felt like human text if you didn't look too closely. 00:04:46.940 |
I could well see before the end of this year, hundreds of millions of people using this 00:04:52.820 |
Imagine every school child in the world walking out of their lesson in whichever language 00:04:57.860 |
with a catchy tune about what they've learned. 00:05:00.380 |
So yes, I do believe that Udio is the biggest news of this week. 00:05:05.100 |
But of course, we had the mysterious release of a new GPT-4 Turbo model from OpenAI. 00:05:17.300 |
They probably thought it wasn't enough of a step forward to give it that name. 00:05:21.520 |
The strangeness was the repeated emphasis on it being better than previous iterations, 00:05:34.340 |
All the top players at OpenAI like Greg Brockman and Mira Murati tweeted out the news of the 00:05:41.500 |
But strangely, for the first time, Sam Altman didn't. 00:05:44.780 |
Now this isn't about reading any tea leaves, it's just a very strange announcement from 00:05:50.380 |
I ran my own maths and logic benchmarks and I couldn't see much of a difference. 00:05:54.540 |
It failed the same questions that the January version of GPT-4 Turbo failed. 00:05:59.260 |
Of course, the functionality improved with function calling within vision. 00:06:03.100 |
But what intrigued me was the repeated claims that GPT-4 reasoning had been further improved. 00:06:09.380 |
Naturally, on this channel, that's what I was most focused about. 00:06:15.180 |
Here though is some of the best benchmarking work that I could find. 00:06:19.100 |
On the noted math benchmark from Dan Hendricks, you could see a bump in its performance on 00:06:24.520 |
the hardest style of questions, from 35% to around 45%. 00:06:29.480 |
Even one level down, the performance bumped up from 57% to 66%. 00:06:34.380 |
The difference on the easier questions wasn't nearly as pronounced. 00:06:37.940 |
It seems pretty clear that the dataset got augmented with some high-level mathematics 00:06:48.680 |
You can't complain about contamination because they source their questions from after the 00:06:54.660 |
And again, as you can see, performance has increased, particularly for harder questions. 00:06:59.340 |
These are sourced from contests like LeetCode. 00:07:02.140 |
And that applies not just to code generation, but self-repair. 00:07:06.300 |
Again though, we're not talking about massive leaps, just small bumps. 00:07:10.140 |
Here though is the clearest assessment from Epoch AI. 00:07:13.780 |
The diamond set of the GPQA are the hardest kind of graduate questions. 00:07:19.460 |
We're talking Google-proof STEM questions that even PhDs find hard. 00:07:24.860 |
And yes, there was a bump, maybe by 2% or 3%, but GPT-4 Turbo, April edition, is still 00:07:35.420 |
Of course, the deeper question is whether or not this indicates some inherent limitations 00:07:41.020 |
on just simply training on more and more advanced data. 00:07:44.740 |
It's a bit like the current paradigm can only go so far, even with better data. 00:07:49.740 |
Of course, you can watch any of my other videos to see why I don't think that will be much 00:07:56.580 |
Now it would be remiss of me not to spend a few seconds touching on two releases from 00:08:03.140 |
I'm not going to call it the open source community because they're not releasing their training 00:08:08.340 |
I'm talking about the new Mixed Trial 8x22 billion Mixture of Experts model and Cohere's 00:08:15.780 |
Now you can judge for yourself, but they land around the level of Claude III Sonnet, which 00:08:24.900 |
Some people may have expected the OpenWeights community to have caught up to GPT-4 by now, 00:08:32.060 |
Of course, let's wait to see if LLAMA 3 can further bridge that gap. 00:08:36.620 |
Now before we get to Google, there was one more announcement of a model I want to touch 00:08:42.380 |
So as I've done once before on this channel, I reached out to the company to ask about 00:08:48.500 |
I've probably turned down thousands of sponsorship offers, but I'm happy to say that this part 00:08:58.900 |
And basically the reason I reached out to them is because it's really darn good. 00:09:03.520 |
I'm often transcribing videos and rarely do they get characters like GPT correct, let 00:09:13.580 |
So yes, Universal One is the model I personally use and you can see some comparisons to other 00:09:21.260 |
It does seem to hallucinate less than Whisper and takes 38 seconds to process an hour of 00:09:28.940 |
Anyway, Universal One only came out like a week ago and I think it's epic, but let me 00:09:34.940 |
The link, of course, will be in the description. 00:09:37.700 |
But now from yesterday, a quite fascinating paper from Google. 00:09:42.100 |
It's about transformer models that could have infinite context. 00:09:49.420 |
I must say unusually for this channel, I haven't had a chance to finish the paper before talking 00:09:55.460 |
I wanted to include it in this video for a reason. 00:09:57.980 |
Of course, the prospect of feeding in entire libraries is fascinating. 00:10:02.680 |
But my theory is that this approach might be behind Gemini 1.5's long context ability. 00:10:09.260 |
If you remember, Gemini 1.5, whose API is now widely available, was able to process 00:10:20.720 |
If you're not familiar with tokens, think 10 million tokens as being around 8 million 00:10:25.860 |
And if that's a daunting number, think 8 entire sets of Harry Potter novels. 00:10:30.300 |
Now, on the day that Gemini 1.5 came out, I called it the biggest development of that 00:10:35.460 |
day, despite it being the same day that Sora came out. 00:10:41.220 |
Gemini 1.5 could find metaphorical needles in videos 3 hours long or audio 22 hours long. 00:10:48.620 |
And the performance just kept improving up to and beyond 10 million tokens. 00:10:53.900 |
But back to yesterday's paper, why do I think there's any link? 00:10:57.100 |
Now, one hint is that one of the authors, Manal Faruqi, and sorry if I'm mispronouncing 00:11:01.300 |
your name, was also an author in the original Gemini papers. 00:11:06.340 |
The other hint comes from the paper itself, where they call their approach a "plug-and-play 00:11:11.460 |
long-context adaptation capability" with which they can "continually pre-train existing 00:11:19.300 |
In other words, it appears like you can take existing LLMs and just pre-train them with 00:11:23.500 |
this approach to make them great at long-context, or indeed infinite-context. 00:11:28.860 |
Is that part of what happened to Gemini 1 Pro to turn it into Gemini 1.5 Pro? 00:11:34.300 |
Anyway, it is interesting that Google published this, while still being a bit cagey about 00:11:40.920 |
They do conclude though that this approach enables LLMs to process infinitely long-context, 00:11:47.260 |
even though they've got bounded memory and computation resources. 00:11:51.220 |
Now I am going to consult with some colleagues before I say much more about this paper, but 00:11:58.620 |
Imagine a model being able to process every film made by a particular director, or every 00:12:04.260 |
work of French literature between a particular period, or every email that you've ever sent 00:12:11.100 |
But let's not get too far ahead of ourselves because it's not like Google don't have their 00:12:16.280 |
This week we learned that apparently Demis Hassabis said that he thought it would be 00:12:20.140 |
especially difficult for Google to catch up to its rival OpenAI with generated video. 00:12:26.700 |
He also apparently mused about leaving Google and raising billions of dollars to start a 00:12:33.220 |
If he did leave to start his own lab, that would swiftly become a very competitive lab. 00:12:39.360 |
To bring us back to the start, that's actually how UDIO was born. 00:12:43.240 |
We learned from the information that UDIO is the work of Uncharted Labs, made up primarily 00:12:51.780 |
Those researchers had created the model Lyria back in the spring of last year. 00:12:57.060 |
That could be a very similar model to what we now have in UDIO, but the company didn't 00:13:02.460 |
unveil it until November of last year and Google still hasn't made it available to 00:13:08.020 |
It seems like Demis Hassabis isn't the only one with some frustration at Google. 00:13:12.820 |
But before I end the video, I must give Google great credit for this release within the last 00:13:19.260 |
With deep learning, of course, they trained these ultra cute football players. 00:13:25.500 |
These two players weren't manually designed to do the moves they're doing. 00:13:29.580 |
Through deep reinforcement learning, they learnt to anticipate ball movements and block 00:13:36.380 |
And these guys were trained in simulation, which I talked about in my recent NVIDIA video. 00:13:41.380 |
Compared to a pre-scripted baseline, these agents walked three times faster, turned four 00:13:50.460 |
Soon therefore, we could have our own mini Erling Haaland. 00:13:57.800 |
As always, let me know what you think in the comments. 00:14:03.180 |
But regardless, thank you so much for watching and have a wonderful day.