Here are seven developments in AI that you might have missed this week from chat GPT avatars to open source models on an iPhone and alpha dev to Zuckerberg's projections of super intelligence. But first something a little unconventional with a modicum of wackiness embodied VR chess. This robot on my left is being controlled by a human in a suit over there and this robot on my right is being controlled by a human over there.
They both have feedback gloves they have VR headsets and they're seeing everything the robot sees. Now specifically today we're looking at avatars robot avatars to be precise. They can play chess but they can do much more they can perform maintenance rescue operations and do anything that a human can do with its hands and eyes.
Could this be the future of sports and things like MMA where you fight using robotic embodied avatars? But for something a little less intense we have a robot that can do a lot more than just playing chess. We have this robot chef who learned by watching videos. It does make me wonder how long before we see something like this at a McDonald's near you.
But now it's time to talk about something that is already available which is the HeyGen plugin in chat GPT. It allows you to fairly quickly create an avatar of the text produced by chat GPT and I immediately thought of one use case that I think could take off in the near future.
By combining the Wolfram plugin with HeyGen I asked chat GPT to solve this problem and then output an explainer video using an avatar. A quick tip here is to tell chat GPT the plugins that you want it to use otherwise it's kind of reluctant to do so. As you can see chat GPT using Wolfram was able to get the question right but for some people it's a little bit more complicated.
So check this out. The retail price of a certain kettlebell is $70. This price represents a 25% profit over the wholesale cost. To find the profit per kettlebell sold at retail price we first need to find the wholesale cost. We know that $70 is 125% of the wholesale cost.
Next we have Runway Gen 2 which I think gives us a glimpse of what the future of text video will be like. A long long time ago at Lady Winterbottom's lovely tea party which is in the smoking ruins and ashes of New York City. A fierce woman ain't playing no games and is out to kick some butts against the unimaginable brutal merciless and scary lobby boy of the delightful Grand Budapest Hotel.
And everything seems doomed and lost until a super handsome man arises the true hero and great mastermind behind all of this. Now of course that's not perfect and as you can see from my brief attempt here there is lots to work on. But just remember where Midjourney was a year ago to help you imagine where Runway will be in a year's time.
And speaking of a year's time if AI generated fake images are already being used politically imagine how they're going to be used or videos in a year's time. But now it's time for the paper that I had to read two or three times to grasp and it will be of interest to anyone who is following developments in open source models.
I'm going to try to skip the jargon as much as possible and just give you the most interesting details. Essentially they found a way to compress large language models like Llama or Falcon across model scales. And even though other people had done this they were able to achieve it in a near lossless way.
This has at least two significant implications. One that bigger models can be used on smaller devices even as small as an iPhone. And second the inference speed gets speeded up as you can see by 15 to 20 percent. In translation that means the output from the language model is going to be as small as an iPhone.
And the inference speed of the language model comes out more quickly. To the best of my understanding the way they did this is that they identified and isolated outlier weights. In translation that's the parts of the model that are most significant to its performance. They stored those with more bits that is to say with higher precision.
While compressing all other weights to three to four bits. That reduces the amount of RAM or memory required to operate the model. There were existing methods of achieving this shrinking or quantization like round to nearest or GPTQ. But they ended up with more errors and generally less accuracy in text generation as we'll see in a moment.
SPQR did best across the model scales. To cut a long story short they envisage models like Llama or indeed Orca which I just did a video on. Existing on devices such as an iPhone 14. If you haven't watched my last video on the Orca model do check it out because it shows that in some tests that 13 billion parameter model is competitive with ChatGPT or GPT 3.5.
So a lot of people have said that this is a bad thing. But in fact it's not. It's a bad thing. Imagining that on my phone which has 12 gigs of RAM is quite something. Here are a few examples comparing the original models with the outputs using SPQR and the older form of quantization.
And when you notice how similar the outputs are from SPQR to the original model just remember that it's about four times smaller in size. And yes they did compare Llama and Falcon at 40 billion parameters across a range of tests using SPQR. Remember that this is the base Llama model accidentally leaked by Meta not an enhanced version like Orca.
And you can see the results for Llama and Falcon are comparable. And here's what they say at the end. SPQR might have a wide-reaching effect on how large language models are used by the general population to complete useful tasks. But they admit that LLMs are inherently a dual-use technology that can bring both significant benefits and serious harm.
And it is interesting the waiver that they give. However we believe that the marginal impact of the LLMs on the LLMs is not necessarily a good thing. So we're not going to be able to make a decision about whether or not to include LLMs in the SPQR model. We're going to have to make a decision about whether or not to include LLMs in the SPQR model.
So we're going to have to make a decision about whether or not to include LLMs in the SPQR model. So we're going to have to be positive or neutral. In other words our algorithm does not create models with new capabilities and risks. It only makes existing models more accessible.
Speaking of accessible it was of course Meta that originally leaked Llama. And they are not only working on a rival to Twitter apparently called Project 92 but also on bringing in AI assistance to things like WhatsApp and Instagram. But Mark Zuckerberg the head of Meta who does seem to be rather influenced by Jan LeCun's thinking does have some questions about autonomous AI.
My own view is that where we really need to be careful is on the development of autonomy and how we think about that. Because it's actually the case that relatively simple and unintelligent things that have runaway autonomy and just spread themselves or you know it's like we have a word for that it's a virus.
Can be simple computer code that is not particularly intelligent but just spreads itself and does a lot of harm. A lot of what I think we need to develop when people talk about safety and responsibility is really the governance on the autonomy that can be given to systems. It does seem to me though that any model release will be fairly quickly made autonomous.
Look at the just two week gap the release of GPT-4 and the release of AutoGPT. So anyone releasing a model needs to assume that it's going to be made to be autonomous fairly quickly. Next Zuckerberg talked about super intelligence and compared it to a corporate model. So what does that mean?
Well it's a very simple thing. It's a very simple operation. You still didn't answer the question of what year we're going to have super intelligence. I'd like to hold you to that. No I'm just kidding. But is there something you could say about the timeline as you think about the development of AGI super intelligence systems?
Sure. So I still don't think I have any particular insight on when like a singular AI system that is a general intelligence will get created. But I think the one thing that most people in the discourse that I've seen about this haven't really grappled with is that we do seem to have organizations and structures in the world that exhibit greater than human intelligence already.
So one example is a company. But I certainly hope that Meta with tens of thousands of people makes smarter decisions than one person. But I think that that would be pretty bad if it didn't. I think he's underestimating a super intelligence which would be far faster and more impressive I believe than any company.
Here's one quick example from DeepMind where their alpha dev system sped up sorting small sequences by 70 percent. Because operations like this are performed trillions of times a day this made headlines. But then I saw this. Apparently GPT-4 discovered the same trick as alpha dev and the author sarcastically asked can I publish this on nature?
And to be honest when you see the prompts that he used it strikes me that he was using GPT 3.5 the original chat GPT in green. Not GPT-4. Anyway back to super intelligence and science at digital speed. When you hear the following anecdote from Demis Hassabis you might question the analogy between a corporation and a super intelligence.
Alpha fold is a sort of science of digital speed in two ways. One is that it can fold the proteins in you know milliseconds instead of taking years of experimental work right. So 200 million proteins he times that by a PhD time of five years that's like a billion years of PhD time right by some measure that has been done.
So it's a super intelligence and science at digital speed done in a year. Billions of years of PhD time in the course of a single year of computation. Honestly AI is going to accelerate absolutely everything and it's not going to be like anything we have seen before. Thank you so much for watching and have a wonderful day.