back to index

The AI News You Might Have Missed This Week - Zuckerberg to Falcon w/ SPQR


Whisper Transcript | Transcript Only Page

00:00:00.000 | Here are seven developments in AI that you might have missed this week from chat GPT avatars to
00:00:06.040 | open source models on an iPhone and alpha dev to Zuckerberg's projections of super intelligence.
00:00:12.800 | But first something a little unconventional with a modicum of wackiness embodied VR chess.
00:00:20.480 | This robot on my left is being controlled by a human in a suit over there and this robot on my
00:00:24.960 | right is being controlled by a human over there. They both have feedback gloves they have VR
00:00:30.360 | headsets and they're seeing everything the robot sees. Now specifically today we're looking at
00:00:35.520 | avatars robot avatars to be precise. They can play chess but they can do much more they can
00:00:40.720 | perform maintenance rescue operations and do anything that a human can do with its hands and
00:00:45.280 | eyes. Could this be the future of sports and things like MMA where you fight using robotic
00:00:50.800 | embodied avatars? But for something a little less intense we have a robot that can do a lot more
00:00:54.940 | than just playing chess. We have this robot chef who learned by watching videos.
00:00:58.560 | It does make me wonder how long before we see something like this at a McDonald's near you.
00:01:14.780 | But now it's time to talk about something that is already available which is the HeyGen plugin
00:01:20.500 | in chat GPT. It allows you to fairly quickly create an avatar
00:01:24.920 | of the text produced by chat GPT and I immediately thought of one use case that I think could take
00:01:31.140 | off in the near future. By combining the Wolfram plugin with HeyGen I asked chat GPT to solve this
00:01:37.700 | problem and then output an explainer video using an avatar. A quick tip here is to tell chat GPT
00:01:44.480 | the plugins that you want it to use otherwise it's kind of reluctant to do so. As you can see
00:01:49.800 | chat GPT using Wolfram was able to get the question right but for some people it's a little bit
00:01:54.900 | more complicated. So check this out.
00:01:58.360 | The retail price of a certain kettlebell is $70. This price represents a 25% profit over the wholesale cost.
00:02:06.520 | To find the profit per kettlebell sold at retail price we first need to find the wholesale cost.
00:02:12.320 | We know that $70 is 125% of the wholesale cost.
00:02:17.460 | Next we have Runway Gen 2 which I think gives us a glimpse of what the future of text video will be like.
00:02:24.880 | A long long time ago at Lady Winterbottom's lovely tea party which is in the smoking ruins and ashes
00:02:31.920 | of New York City. A fierce woman ain't playing no games and is out to kick some butts against the
00:02:37.520 | unimaginable brutal merciless and scary lobby boy of the delightful Grand Budapest Hotel.
00:02:42.740 | And everything seems doomed and lost until a super handsome man arises the true hero and
00:02:50.100 | great mastermind behind all of this. Now of course that's not perfect and as you can see
00:02:54.860 | from my brief attempt here there is lots to work on. But just remember where Midjourney was a year
00:03:00.380 | ago to help you imagine where Runway will be in a year's time. And speaking of a year's time if AI
00:03:06.140 | generated fake images are already being used politically imagine how they're going to be used
00:03:11.140 | or videos in a year's time. But now it's time for the paper that I had to read two or three times
00:03:16.340 | to grasp and it will be of interest to anyone who is following developments in open source models.
00:03:22.340 | I'm going to try to skip the jargon as much as possible
00:03:24.840 | and just give you the most interesting details. Essentially they found a way to compress large
00:03:30.180 | language models like Llama or Falcon across model scales. And even though other people had done this
00:03:35.740 | they were able to achieve it in a near lossless way. This has at least two significant implications.
00:03:41.140 | One that bigger models can be used on smaller devices even as small as an iPhone. And second
00:03:47.660 | the inference speed gets speeded up as you can see by 15 to 20 percent. In translation that means the
00:03:53.880 | output from the language model is going to be as small as an iPhone. And the inference speed
00:03:54.820 | of the language model comes out more quickly. To the best of my understanding the way they did this
00:03:59.080 | is that they identified and isolated outlier weights. In translation that's the parts of
00:04:04.360 | the model that are most significant to its performance. They stored those with more bits
00:04:08.800 | that is to say with higher precision. While compressing all other weights to three to four
00:04:14.320 | bits. That reduces the amount of RAM or memory required to operate the model. There were existing
00:04:20.020 | methods of achieving this shrinking or quantization like round to nearest or
00:04:24.800 | GPTQ. But they ended up with more errors and generally less accuracy in text generation as
00:04:30.360 | we'll see in a moment. SPQR did best across the model scales. To cut a long story short they
00:04:36.320 | envisage models like Llama or indeed Orca which I just did a video on. Existing on devices such as
00:04:42.320 | an iPhone 14. If you haven't watched my last video on the Orca model do check it out because it shows
00:04:47.540 | that in some tests that 13 billion parameter model is competitive with ChatGPT or GPT 3.5.
00:04:54.440 | So a lot of people have said that this is a bad thing. But in fact it's not. It's a bad thing.
00:04:54.780 | Imagining that on my phone which has 12 gigs of RAM is quite something. Here are a few examples
00:05:00.600 | comparing the original models with the outputs using SPQR and the older form of quantization.
00:05:06.780 | And when you notice how similar the outputs are from SPQR to the original model just remember
00:05:12.080 | that it's about four times smaller in size. And yes they did compare Llama and Falcon at
00:05:18.440 | 40 billion parameters across a range of tests using SPQR. Remember that this is the base Llama
00:05:24.760 | model accidentally leaked by Meta not an enhanced version like Orca. And you can see the results for
00:05:30.900 | Llama and Falcon are comparable. And here's what they say at the end. SPQR might have a wide-reaching
00:05:36.360 | effect on how large language models are used by the general population to complete useful tasks.
00:05:42.160 | But they admit that LLMs are inherently a dual-use technology that can bring both significant benefits
00:05:48.420 | and serious harm. And it is interesting the waiver that they give. However we believe that the marginal
00:05:54.080 | impact of the LLMs on the LLMs is not necessarily a good thing. So we're not going to be able to
00:05:54.740 | make a decision about whether or not to include LLMs in the SPQR model. We're going to have to
00:05:55.240 | make a decision about whether or not to include LLMs in the SPQR model. So we're going to have to
00:05:55.740 | make a decision about whether or not to include LLMs in the SPQR model. So we're going to have to
00:05:56.240 | be positive or neutral. In other words our algorithm does not create models with new
00:06:01.200 | capabilities and risks. It only makes existing models more accessible. Speaking of accessible
00:06:06.560 | it was of course Meta that originally leaked Llama. And they are not only working on a rival
00:06:12.340 | to Twitter apparently called Project 92 but also on bringing in AI assistance to things like WhatsApp
00:06:19.360 | and Instagram. But Mark Zuckerberg the head of Meta who does seem to be rather influenced by
00:06:24.720 | Jan LeCun's thinking does have some questions about autonomous AI.
00:06:29.780 | My own view is that where we really need to be careful is on the development of autonomy and how we think about that.
00:06:39.240 | Because it's actually the case that relatively simple and unintelligent things that have runaway autonomy
00:06:45.240 | and just spread themselves or you know it's like we have a word for that it's a virus. Can be simple
00:06:50.340 | computer code that is not particularly intelligent but just spreads itself and does a lot of harm. A
00:06:54.700 | lot of what I think we need to develop when people talk about safety and responsibility is really the
00:06:59.740 | governance on the autonomy that can be given to systems. It does seem to me though that any model
00:07:05.580 | release will be fairly quickly made autonomous. Look at the just two week gap the release of GPT-4
00:07:11.820 | and the release of AutoGPT. So anyone releasing a model needs to assume that it's going to be made
00:07:17.660 | to be autonomous fairly quickly. Next Zuckerberg talked about super intelligence and compared it to
00:07:23.800 | a corporate model. So what does that mean? Well it's a very simple thing. It's a very simple
00:07:24.680 | operation. You still didn't answer the question of what year we're going to have super intelligence. I'd like to hold you to that. No I'm just kidding. But is there something you could say about the timeline as you think about the development of AGI super intelligence systems? Sure. So I still don't think I have any particular insight on when like a singular AI system that is a general intelligence will get created. But I think the one thing that most people in the discourse
00:07:54.660 | that I've seen about this haven't really grappled with is that we do seem to have organizations and
00:08:01.120 | structures in the world that exhibit greater than human intelligence already. So one example is a
00:08:08.920 | company. But I certainly hope that Meta with tens of thousands of people makes smarter decisions than
00:08:15.480 | one person. But I think that that would be pretty bad if it didn't. I think he's underestimating a
00:08:20.000 | super intelligence which would be far faster and more impressive I believe
00:08:24.640 | than any company. Here's one quick example from DeepMind where their alpha dev system sped up
00:08:30.080 | sorting small sequences by 70 percent. Because operations like this are performed trillions of
00:08:35.360 | times a day this made headlines. But then I saw this. Apparently GPT-4 discovered the same trick
00:08:41.840 | as alpha dev and the author sarcastically asked can I publish this on nature? And to be honest
00:08:47.320 | when you see the prompts that he used it strikes me that he was using GPT 3.5 the original chat GPT
00:08:53.920 | in green.
00:08:54.620 | Not GPT-4. Anyway back to super intelligence and science at digital speed. When you hear the
00:09:00.060 | following anecdote from Demis Hassabis you might question the analogy between a corporation and a
00:09:05.820 | super intelligence. Alpha fold is a sort of science of digital speed in two ways. One is that it can
00:09:11.580 | fold the proteins in you know milliseconds instead of taking years of experimental work right. So 200
00:09:17.340 | million proteins he times that by a PhD time of five years that's like a billion years of PhD time
00:09:22.900 | right by some measure that has been done. So it's a super intelligence and science at digital speed
00:09:24.600 | done in a year.
00:09:25.880 | Billions of years of PhD time in the course of a single year of computation. Honestly AI is going
00:09:32.560 | to accelerate absolutely everything and it's not going to be like anything we have seen before.
00:09:37.640 | Thank you so much for watching and have a wonderful day.