back to index

Udio, the Mysterious GPT Update, and Infinite Attention


Whisper Transcript | Transcript Only Page

00:00:00.000 | It's been a strange 48 hours in the world of AI with releases like Oudio that have reminded
00:00:06.440 | millions of people what AI is capable of and models that can pay you infinite attention.
00:00:13.080 | But we also got befuddling updates from OpenAI that suggest that not all is smooth sailing.
00:00:18.960 | I'll start, of course, with the new model on Oudio.com and how musicians are reacting.
00:00:25.880 | Then cover the perplexing manner of the release of GPT-4 Turbo with Vision and touch on a
00:00:31.560 | fascinating new Infinite Context paper from Google.
00:00:35.360 | But now let's hear three 20-second extracts from Oudio to give you an inkling, if you
00:00:41.120 | haven't heard it already, of what it's capable of.
00:00:44.160 | Here's Dune, the Broadway musical.
00:01:06.840 | And now for some quite frankly amazing AI-generated classical music.
00:01:25.640 | And next, something I'm going to bleep a little bit, but represents the reaction of Uncharted
00:01:31.320 | Labs, who are behind Oudio, to their servers going down.
00:01:44.300 | And of course, I have been playing about with Oudio like almost everyone has, and did you
00:02:03.100 | know it can do stand-up comedy?
00:02:21.820 | Now I'm not sure if this guy is talking about me, but I thought I'd let you know that this
00:02:25.660 | kind of thing is possible.
00:02:27.460 | And how about a quick, direct comparison between Oudio and Suno V3?
00:02:41.220 | Now, I prefer Oudio there, but you do sometimes get complete gobbledygook.
00:03:10.100 | Now Will.i.am calls Oudio the best tech on earth, and Uncharted Labs, which is the company
00:03:22.420 | behind Oudio, he says is really aiming to be an ally for creatives and artists.
00:03:28.420 | Now it should of course be noted that Will.i.am is an investor in Oudio, but again they repeat
00:03:33.500 | that Oudio is about building AI tools to enable the next generation of music creators.
00:03:39.460 | Now of course everyone has their own opinion, but let's now get a taste of the reaction
00:03:43.060 | from some musicians.
00:03:44.880 | One says it's pretty scary thinking what is going to exist a year or two from now, and
00:03:49.980 | what it means for musicians, listeners, and the industry as a whole.
00:03:53.660 | The top comment says I would buy a band t-shirt, but never buy a shirt for an AI, which makes
00:03:59.300 | sense.
00:04:00.300 | But here are two more common reactions.
00:04:02.660 | I am a music professional, producer/composer.
00:04:05.740 | This is highly advanced, and I thought this stuff was years away.
00:04:10.140 | And one more, I've already gone full circle with it, past the confusion and devastation,
00:04:15.180 | and now I'm just curious what Gregorian chant would sound like with, I can't even pronounce
00:04:19.900 | that, and blast beats.
00:04:21.660 | So definitely a mixed reaction from musicians.
00:04:24.860 | Personally, I don't think it's too much of an exaggeration to call this the Chachapiti
00:04:29.460 | moment for music generation.
00:04:31.660 | Zuno often has a slight tinniness that gives it away for those not following AI, but with
00:04:37.340 | Udio, I think you could convince many people that they're listening to human music, just
00:04:42.620 | like Chachapiti felt like human text if you didn't look too closely.
00:04:46.940 | I could well see before the end of this year, hundreds of millions of people using this
00:04:51.620 | for entertainment.
00:04:52.820 | Imagine every school child in the world walking out of their lesson in whichever language
00:04:57.860 | with a catchy tune about what they've learned.
00:05:00.380 | So yes, I do believe that Udio is the biggest news of this week.
00:05:05.100 | But of course, we had the mysterious release of a new GPT-4 Turbo model from OpenAI.
00:05:12.020 | And why do I call it mysterious?
00:05:13.580 | Well, not because it wasn't named GPT-4.5.
00:05:17.300 | They probably thought it wasn't enough of a step forward to give it that name.
00:05:21.520 | The strangeness was the repeated emphasis on it being better than previous iterations,
00:05:27.380 | but without any detail.
00:05:28.640 | They called it majorly improved.
00:05:30.940 | Where are the benchmarks though?
00:05:32.680 | And now here's some more mystery.
00:05:34.340 | All the top players at OpenAI like Greg Brockman and Mira Murati tweeted out the news of the
00:05:40.500 | new model.
00:05:41.500 | But strangely, for the first time, Sam Altman didn't.
00:05:44.780 | Now this isn't about reading any tea leaves, it's just a very strange announcement from
00:05:49.380 | OpenAI.
00:05:50.380 | I ran my own maths and logic benchmarks and I couldn't see much of a difference.
00:05:54.540 | It failed the same questions that the January version of GPT-4 Turbo failed.
00:05:59.260 | Of course, the functionality improved with function calling within vision.
00:06:03.100 | But what intrigued me was the repeated claims that GPT-4 reasoning had been further improved.
00:06:09.380 | Naturally, on this channel, that's what I was most focused about.
00:06:12.900 | The cutting edge of intelligence.
00:06:15.180 | Here though is some of the best benchmarking work that I could find.
00:06:19.100 | On the noted math benchmark from Dan Hendricks, you could see a bump in its performance on
00:06:24.520 | the hardest style of questions, from 35% to around 45%.
00:06:29.480 | Even one level down, the performance bumped up from 57% to 66%.
00:06:34.380 | The difference on the easier questions wasn't nearly as pronounced.
00:06:37.940 | It seems pretty clear that the dataset got augmented with some high-level mathematics
00:06:42.860 | and code.
00:06:43.860 | Otherwise, it wasn't too much changed.
00:06:46.460 | Here's another example, LiveCodeBench.
00:06:48.680 | You can't complain about contamination because they source their questions from after the
00:06:52.660 | training day of the models.
00:06:54.660 | And again, as you can see, performance has increased, particularly for harder questions.
00:06:59.340 | These are sourced from contests like LeetCode.
00:07:02.140 | And that applies not just to code generation, but self-repair.
00:07:06.300 | Again though, we're not talking about massive leaps, just small bumps.
00:07:10.140 | Here though is the clearest assessment from Epoch AI.
00:07:13.780 | The diamond set of the GPQA are the hardest kind of graduate questions.
00:07:19.460 | We're talking Google-proof STEM questions that even PhDs find hard.
00:07:24.860 | And yes, there was a bump, maybe by 2% or 3%, but GPT-4 Turbo, April edition, is still
00:07:32.500 | lower performing than Claude III Opus.
00:07:35.420 | Of course, the deeper question is whether or not this indicates some inherent limitations
00:07:41.020 | on just simply training on more and more advanced data.
00:07:44.740 | It's a bit like the current paradigm can only go so far, even with better data.
00:07:49.740 | Of course, you can watch any of my other videos to see why I don't think that will be much
00:07:54.420 | of a bottleneck that much longer.
00:07:56.580 | Now it would be remiss of me not to spend a few seconds touching on two releases from
00:08:01.360 | the OpenWeights community.
00:08:03.140 | I'm not going to call it the open source community because they're not releasing their training
00:08:07.340 | datasets.
00:08:08.340 | I'm talking about the new Mixed Trial 8x22 billion Mixture of Experts model and Cohere's
00:08:14.340 | Command R+.
00:08:15.780 | Now you can judge for yourself, but they land around the level of Claude III Sonnet, which
00:08:20.780 | is the medium-sized model.
00:08:22.680 | Of course, that is a proprietary model.
00:08:24.900 | Some people may have expected the OpenWeights community to have caught up to GPT-4 by now,
00:08:30.300 | but that's not quite the case.
00:08:32.060 | Of course, let's wait to see if LLAMA 3 can further bridge that gap.
00:08:36.620 | Now before we get to Google, there was one more announcement of a model I want to touch
00:08:42.380 | So as I've done once before on this channel, I reached out to the company to ask about
00:08:47.500 | a sponsorship.
00:08:48.500 | I've probably turned down thousands of sponsorship offers, but I'm happy to say that this part
00:08:53.140 | of the video is sponsored by Assembly AI.
00:08:56.040 | So what happened?
00:08:57.040 | They released Universal One.
00:08:58.900 | And basically the reason I reached out to them is because it's really darn good.
00:09:03.520 | I'm often transcribing videos and rarely do they get characters like GPT correct, let
00:09:09.320 | alone names like Satya Nadella.
00:09:12.120 | Universal One did.
00:09:13.580 | So yes, Universal One is the model I personally use and you can see some comparisons to other
00:09:19.820 | models in this chart.
00:09:21.260 | It does seem to hallucinate less than Whisper and takes 38 seconds to process an hour of
00:09:27.940 | audio.
00:09:28.940 | Anyway, Universal One only came out like a week ago and I think it's epic, but let me
00:09:33.780 | know what you think.
00:09:34.940 | The link, of course, will be in the description.
00:09:37.700 | But now from yesterday, a quite fascinating paper from Google.
00:09:42.100 | It's about transformer models that could have infinite context.
00:09:46.460 | Not 1 million or 10 million, but infinite.
00:09:49.420 | I must say unusually for this channel, I haven't had a chance to finish the paper before talking
00:09:54.460 | about it.
00:09:55.460 | I wanted to include it in this video for a reason.
00:09:57.980 | Of course, the prospect of feeding in entire libraries is fascinating.
00:10:02.680 | But my theory is that this approach might be behind Gemini 1.5's long context ability.
00:10:09.260 | If you remember, Gemini 1.5, whose API is now widely available, was able to process
00:10:15.340 | up to at least 10 million tokens.
00:10:18.940 | Notice the phrase "at least" there.
00:10:20.720 | If you're not familiar with tokens, think 10 million tokens as being around 8 million
00:10:24.860 | words.
00:10:25.860 | And if that's a daunting number, think 8 entire sets of Harry Potter novels.
00:10:30.300 | Now, on the day that Gemini 1.5 came out, I called it the biggest development of that
00:10:35.460 | day, despite it being the same day that Sora came out.
00:10:38.780 | I would still stick to that to this day.
00:10:41.220 | Gemini 1.5 could find metaphorical needles in videos 3 hours long or audio 22 hours long.
00:10:48.620 | And the performance just kept improving up to and beyond 10 million tokens.
00:10:53.900 | But back to yesterday's paper, why do I think there's any link?
00:10:57.100 | Now, one hint is that one of the authors, Manal Faruqi, and sorry if I'm mispronouncing
00:11:01.300 | your name, was also an author in the original Gemini papers.
00:11:06.340 | The other hint comes from the paper itself, where they call their approach a "plug-and-play
00:11:11.460 | long-context adaptation capability" with which they can "continually pre-train existing
00:11:18.300 | LLMs".
00:11:19.300 | In other words, it appears like you can take existing LLMs and just pre-train them with
00:11:23.500 | this approach to make them great at long-context, or indeed infinite-context.
00:11:28.860 | Is that part of what happened to Gemini 1 Pro to turn it into Gemini 1.5 Pro?
00:11:34.300 | Anyway, it is interesting that Google published this, while still being a bit cagey about
00:11:39.380 | some crucial details.
00:11:40.920 | They do conclude though that this approach enables LLMs to process infinitely long-context,
00:11:47.260 | even though they've got bounded memory and computation resources.
00:11:51.220 | Now I am going to consult with some colleagues before I say much more about this paper, but
00:11:55.780 | just think about some of the possibilities.
00:11:58.620 | Imagine a model being able to process every film made by a particular director, or every
00:12:04.260 | work of French literature between a particular period, or every email that you've ever sent
00:12:10.100 | since birth.
00:12:11.100 | But let's not get too far ahead of ourselves because it's not like Google don't have their
00:12:15.120 | own issues.
00:12:16.280 | This week we learned that apparently Demis Hassabis said that he thought it would be
00:12:20.140 | especially difficult for Google to catch up to its rival OpenAI with generated video.
00:12:26.700 | He also apparently mused about leaving Google and raising billions of dollars to start a
00:12:31.460 | new research lab.
00:12:33.220 | If he did leave to start his own lab, that would swiftly become a very competitive lab.
00:12:39.360 | To bring us back to the start, that's actually how UDIO was born.
00:12:43.240 | We learned from the information that UDIO is the work of Uncharted Labs, made up primarily
00:12:49.300 | of former Google DeepMind staff.
00:12:51.780 | Those researchers had created the model Lyria back in the spring of last year.
00:12:57.060 | That could be a very similar model to what we now have in UDIO, but the company didn't
00:13:02.460 | unveil it until November of last year and Google still hasn't made it available to
00:13:07.020 | the public.
00:13:08.020 | It seems like Demis Hassabis isn't the only one with some frustration at Google.
00:13:12.820 | But before I end the video, I must give Google great credit for this release within the last
00:13:18.100 | 24 hours.
00:13:19.260 | With deep learning, of course, they trained these ultra cute football players.
00:13:23.940 | And yes, I'm calling it football.
00:13:25.500 | These two players weren't manually designed to do the moves they're doing.
00:13:29.580 | Through deep reinforcement learning, they learnt to anticipate ball movements and block
00:13:35.060 | opponent shots.
00:13:36.380 | And these guys were trained in simulation, which I talked about in my recent NVIDIA video.
00:13:41.380 | Compared to a pre-scripted baseline, these agents walked three times faster, turned four
00:13:46.820 | times faster and kicked the ball 30% faster.
00:13:50.460 | Soon therefore, we could have our own mini Erling Haaland.
00:13:53.880 | So quite the rollercoaster 48 hours in AI.
00:13:57.800 | As always, let me know what you think in the comments.
00:14:00.720 | Feel free to hop on board my Patreon.
00:14:03.180 | But regardless, thank you so much for watching and have a wonderful day.