The AI News You Might Have Missed This Week - Zuckerberg to Falcon w/ SPQR

00:00:00.000 | Here are seven developments in AI that you might have missed this week from chat GPT avatars to

00:00:06.040 | open source models on an iPhone and alpha dev to Zuckerberg's projections of super intelligence.

00:00:12.800 | But first something a little unconventional with a modicum of wackiness embodied VR chess.

00:00:20.480 | This robot on my left is being controlled by a human in a suit over there and this robot on my

00:00:24.960 | right is being controlled by a human over there. They both have feedback gloves they have VR

00:00:30.360 | headsets and they're seeing everything the robot sees. Now specifically today we're looking at

00:00:35.520 | avatars robot avatars to be precise. They can play chess but they can do much more they can

00:00:40.720 | perform maintenance rescue operations and do anything that a human can do with its hands and

00:00:45.280 | eyes. Could this be the future of sports and things like MMA where you fight using robotic

00:00:50.800 | embodied avatars? But for something a little less intense we have a robot that can do a lot more

00:00:54.940 | than just playing chess. We have this robot chef who learned by watching videos.

00:00:58.560 | It does make me wonder how long before we see something like this at a McDonald's near you.

00:01:14.780 | But now it's time to talk about something that is already available which is the HeyGen plugin

00:01:20.500 | in chat GPT. It allows you to fairly quickly create an avatar

00:01:24.920 | of the text produced by chat GPT and I immediately thought of one use case that I think could take

00:01:31.140 | off in the near future. By combining the Wolfram plugin with HeyGen I asked chat GPT to solve this

00:01:37.700 | problem and then output an explainer video using an avatar. A quick tip here is to tell chat GPT

00:01:44.480 | the plugins that you want it to use otherwise it's kind of reluctant to do so. As you can see

00:01:49.800 | chat GPT using Wolfram was able to get the question right but for some people it's a little bit

00:01:54.900 | more complicated. So check this out.

00:01:58.360 | The retail price of a certain kettlebell is $70. This price represents a 25% profit over the wholesale cost.

00:02:06.520 | To find the profit per kettlebell sold at retail price we first need to find the wholesale cost.

00:02:12.320 | We know that $70 is 125% of the wholesale cost.

00:02:17.460 | Next we have Runway Gen 2 which I think gives us a glimpse of what the future of text video will be like.

00:02:24.880 | A long long time ago at Lady Winterbottom's lovely tea party which is in the smoking ruins and ashes

00:02:31.920 | of New York City. A fierce woman ain't playing no games and is out to kick some butts against the

00:02:37.520 | unimaginable brutal merciless and scary lobby boy of the delightful Grand Budapest Hotel.

00:02:42.740 | And everything seems doomed and lost until a super handsome man arises the true hero and

00:02:50.100 | great mastermind behind all of this. Now of course that's not perfect and as you can see

00:02:54.860 | from my brief attempt here there is lots to work on. But just remember where Midjourney was a year

00:03:00.380 | ago to help you imagine where Runway will be in a year's time. And speaking of a year's time if AI

00:03:06.140 | generated fake images are already being used politically imagine how they're going to be used

00:03:11.140 | or videos in a year's time. But now it's time for the paper that I had to read two or three times

00:03:16.340 | to grasp and it will be of interest to anyone who is following developments in open source models.

00:03:22.340 | I'm going to try to skip the jargon as much as possible

00:03:24.840 | and just give you the most interesting details. Essentially they found a way to compress large

00:03:30.180 | language models like Llama or Falcon across model scales. And even though other people had done this

00:03:35.740 | they were able to achieve it in a near lossless way. This has at least two significant implications.

00:03:41.140 | One that bigger models can be used on smaller devices even as small as an iPhone. And second

00:03:47.660 | the inference speed gets speeded up as you can see by 15 to 20 percent. In translation that means the

00:03:53.880 | output from the language model is going to be as small as an iPhone. And the inference speed

00:03:54.820 | of the language model comes out more quickly. To the best of my understanding the way they did this

00:03:59.080 | is that they identified and isolated outlier weights. In translation that's the parts of

00:04:04.360 | the model that are most significant to its performance. They stored those with more bits

00:04:08.800 | that is to say with higher precision. While compressing all other weights to three to four

00:04:14.320 | bits. That reduces the amount of RAM or memory required to operate the model. There were existing

00:04:20.020 | methods of achieving this shrinking or quantization like round to nearest or

00:04:24.800 | GPTQ. But they ended up with more errors and generally less accuracy in text generation as

00:04:30.360 | we'll see in a moment. SPQR did best across the model scales. To cut a long story short they

00:04:36.320 | envisage models like Llama or indeed Orca which I just did a video on. Existing on devices such as

00:04:42.320 | an iPhone 14. If you haven't watched my last video on the Orca model do check it out because it shows

00:04:47.540 | that in some tests that 13 billion parameter model is competitive with ChatGPT or GPT 3.5.

00:04:54.440 | So a lot of people have said that this is a bad thing. But in fact it's not. It's a bad thing.

00:04:54.780 | Imagining that on my phone which has 12 gigs of RAM is quite something. Here are a few examples

00:05:00.600 | comparing the original models with the outputs using SPQR and the older form of quantization.

00:05:06.780 | And when you notice how similar the outputs are from SPQR to the original model just remember

00:05:12.080 | that it's about four times smaller in size. And yes they did compare Llama and Falcon at

00:05:18.440 | 40 billion parameters across a range of tests using SPQR. Remember that this is the base Llama

00:05:24.760 | model accidentally leaked by Meta not an enhanced version like Orca. And you can see the results for

00:05:30.900 | Llama and Falcon are comparable. And here's what they say at the end. SPQR might have a wide-reaching

00:05:36.360 | effect on how large language models are used by the general population to complete useful tasks.

00:05:42.160 | But they admit that LLMs are inherently a dual-use technology that can bring both significant benefits

00:05:48.420 | and serious harm. And it is interesting the waiver that they give. However we believe that the marginal

00:05:54.080 | impact of the LLMs on the LLMs is not necessarily a good thing. So we're not going to be able to

00:05:54.740 | make a decision about whether or not to include LLMs in the SPQR model. We're going to have to

00:05:55.240 | make a decision about whether or not to include LLMs in the SPQR model. So we're going to have to

00:05:55.740 | make a decision about whether or not to include LLMs in the SPQR model. So we're going to have to

00:05:56.240 | be positive or neutral. In other words our algorithm does not create models with new

00:06:01.200 | capabilities and risks. It only makes existing models more accessible. Speaking of accessible

00:06:06.560 | it was of course Meta that originally leaked Llama. And they are not only working on a rival

00:06:12.340 | to Twitter apparently called Project 92 but also on bringing in AI assistance to things like WhatsApp

00:06:19.360 | and Instagram. But Mark Zuckerberg the head of Meta who does seem to be rather influenced by

00:06:24.720 | Jan LeCun's thinking does have some questions about autonomous AI.

00:06:29.780 | My own view is that where we really need to be careful is on the development of autonomy and how we think about that.

00:06:39.240 | Because it's actually the case that relatively simple and unintelligent things that have runaway autonomy

00:06:45.240 | and just spread themselves or you know it's like we have a word for that it's a virus. Can be simple

00:06:50.340 | computer code that is not particularly intelligent but just spreads itself and does a lot of harm. A

00:06:54.700 | lot of what I think we need to develop when people talk about safety and responsibility is really the

00:06:59.740 | governance on the autonomy that can be given to systems. It does seem to me though that any model

00:07:05.580 | release will be fairly quickly made autonomous. Look at the just two week gap the release of GPT-4

00:07:11.820 | and the release of AutoGPT. So anyone releasing a model needs to assume that it's going to be made

00:07:17.660 | to be autonomous fairly quickly. Next Zuckerberg talked about super intelligence and compared it to

00:07:23.800 | a corporate model. So what does that mean? Well it's a very simple thing. It's a very simple

00:07:24.680 | operation. You still didn't answer the question of what year we're going to have super intelligence. I'd like to hold you to that. No I'm just kidding. But is there something you could say about the timeline as you think about the development of AGI super intelligence systems? Sure. So I still don't think I have any particular insight on when like a singular AI system that is a general intelligence will get created. But I think the one thing that most people in the discourse

00:07:54.660 | that I've seen about this haven't really grappled with is that we do seem to have organizations and

00:08:01.120 | structures in the world that exhibit greater than human intelligence already. So one example is a

00:08:08.920 | company. But I certainly hope that Meta with tens of thousands of people makes smarter decisions than

00:08:15.480 | one person. But I think that that would be pretty bad if it didn't. I think he's underestimating a

00:08:20.000 | super intelligence which would be far faster and more impressive I believe

00:08:24.640 | than any company. Here's one quick example from DeepMind where their alpha dev system sped up

00:08:30.080 | sorting small sequences by 70 percent. Because operations like this are performed trillions of

00:08:35.360 | times a day this made headlines. But then I saw this. Apparently GPT-4 discovered the same trick

00:08:41.840 | as alpha dev and the author sarcastically asked can I publish this on nature? And to be honest

00:08:47.320 | when you see the prompts that he used it strikes me that he was using GPT 3.5 the original chat GPT

00:08:53.920 | in green.

00:08:54.620 | Not GPT-4. Anyway back to super intelligence and science at digital speed. When you hear the

00:09:00.060 | following anecdote from Demis Hassabis you might question the analogy between a corporation and a

00:09:05.820 | super intelligence. Alpha fold is a sort of science of digital speed in two ways. One is that it can

00:09:11.580 | fold the proteins in you know milliseconds instead of taking years of experimental work right. So 200

00:09:17.340 | million proteins he times that by a PhD time of five years that's like a billion years of PhD time

00:09:22.900 | right by some measure that has been done. So it's a super intelligence and science at digital speed

00:09:24.600 | done in a year.

00:09:25.880 | Billions of years of PhD time in the course of a single year of computation. Honestly AI is going

00:09:32.560 | to accelerate absolutely everything and it's not going to be like anything we have seen before.

00:09:37.640 | Thank you so much for watching and have a wonderful day.