9 AI Developments: HeyGen 2.0 to AjaxGPT, Open Interpreter to NExT-GPT and Roblox AI

00:00:00.000 | There were nine impactful AI developments in the last few days that I wanted to tell you guys about.

00:00:06.260 | From the frankly startling HeyGen video translation, to the epic new prompt optimising paper,

00:00:13.240 | and from Apple's IAX GPT to Open Interpreter, Next GPT, yet more Google Gemini news,

00:00:20.340 | and even Roblox AI. But I must start with HeyGen.

00:00:25.000 | You probably already heard from AI Explained that HeyGen can generate lifelike videos,

00:00:30.820 | and is available as a plugin to ChatGPT. But how about video language dubbing?

00:00:36.000 | Well, today I got access to their new Avatar 2.0 feature, and I decided to test it out with

00:00:43.560 | Sam Altman. Not with the real Sam Altman, of course, but with his testimony to the Senate.

00:00:48.540 | And I want Spanish language speakers to tell me how accurate they think this is.

00:00:53.380 | My worst fears are that we are causing significant damage to the field, technology, industry, and the world.

00:00:59.960 | I think that could happen in various ways. That's why we started the company.

00:01:05.320 | It's a big part of why I'm here today, and we've been able to spend time with you.

00:01:10.620 | If this technology fails, it can be disastrous. We want to be clear about our position on this.

00:01:16.900 | I have been researching three or four tools, including this one,

00:01:20.660 | to translate my videos into dozens of languages.

00:01:23.360 | And I can't wait to put that into place.

00:01:25.980 | But time waits for no man, so let me move on to Open Interpreter.

00:01:30.580 | I've been using the version released five days ago.

00:01:33.560 | What is it? Well, an open source code interpreter. Here is a brief preview.

00:01:38.100 | Open Interpreter.

00:02:06.060 | Open Interpreter.

00:02:06.760 | Of course, I've been trying to figure out how to use Open Interpreter. I've been trying to figure out how to use Open Interpreter.

00:02:08.080 | I've been trying it out intensively, and while it's not perfect, it has proven useful.

00:02:13.080 | I asked it this:

00:02:14.280 | Download this YouTube video in 1440p, e.g. using PyTube, and clip out 2318, 2338, save to desktop, naming file Altman vertical line.

00:02:26.700 | That's a clip I wanted to use in this very video.

00:02:30.000 | Now, okay, it wouldn't have taken me that long to do it manually, but this process was a few seconds.

00:02:36.640 | You agreed to run the code a few times, and here was the end result.

00:02:41.000 | You can try to guess why I picked out this clip from a recent Sam Altman interview.

00:02:46.680 | And we weren't set up for that.

00:02:48.680 | Is the usage of Chatipati decelerating?

00:02:52.880 | No.

00:02:54.200 | I think it maybe took like a little bit of a flat line during the summer, which happens for lots of products, but it is...

00:03:02.440 | Doink up.

00:03:05.300 | *sniff*

00:03:07.300 | Obviously, we shouldn't automatically believe the CEO of the company about how many people are using his product.

00:03:14.380 | But I do think it points to a counter-narrative to the argument that Chatipati usage is continuing to slow down into the autumn.

00:03:21.940 | You don't have to use GPT-4 either, and more generally, I think this points to a change in the way we will use computers in the near future.

00:03:30.900 | And now let's talk about this paper from Google DeepMind.

00:03:33.700 | I'll have more news about their Gemini.

00:03:35.020 | I'll talk about the Gemini model in a second, but first, I found this paper fascinating.

00:03:39.140 | I will hopefully be doing a deeper dive with one of the authors, but for now, the big picture is this.

00:03:44.980 | Language models can come up with optimized prompts for language models.

00:03:49.580 | These aren't small optimizations either, and nor do they work with only one model.

00:03:54.620 | The paper says that with a variety of large language models, we demonstrate that the best prompts optimized by their method outperform human design prompts by up to 8% or more.

00:04:04.780 | The paper says that with a variety of large language models, we demonstrate that the best prompts optimized by their method outperform human design prompts by up to 8% or more.

00:04:04.820 | The paper says that with a variety of large language models, we demonstrate that the best prompts optimized by their method outperform human design prompts by up to 8% or more.

00:04:04.860 | The paper says that with a variety of large language models, we demonstrate that the best prompts optimized by their method outperform human design prompts by up to 8% or more.

00:04:04.940 | The paper says that with a variety of large language models, we demonstrate that the best prompts optimized by their method outperform human design prompts by up to 8% or more.

00:04:09.100 | bench hard tasks. Those are long-standing tasks known for their difficulty for large language

00:04:14.760 | models. To massively oversimplify, models like Palm 2 and GPT-4 can be given a meta prompt. For

00:04:21.120 | example, generate a new instruction that achieves a higher accuracy on a particular task. The

00:04:26.360 | language models are then shown how previous prompts worked out. In this example, for a particular task,

00:04:31.880 | let's figure it out scored 61, while let's solve the problem scored 63 out of 100. This was the

00:04:38.280 | mathematics problem down here. And then they're asked, generate an instruction that is different

00:04:43.080 | from all the instructions above and has a higher score than all the instructions above. The

00:04:48.180 | instruction should be concise, effective, and generally applicable to all problems. And apparently,

00:04:53.400 | GPT-4 was particularly good at looking at the trajectory of optimizations, the patterns and

00:05:00.240 | trends about what produced better prompts on a particular task. For example, you might start with

00:05:05.760 | let's solve the problem, which scored 60%.

00:05:08.160 | And then the language model would propose iterations like let's think carefully about the problem and

00:05:14.260 | solve it together. That got 63.2 and you can see the accuracy gradually going up. Apparently, at least

00:05:20.300 | for math problems, Palm 2 preferred concise prompts, while GPT models liked ones that were long and

00:05:26.640 | detailed. And nor was it just about the semantics or meanings of the prompts. The same meanings

00:05:32.260 | phrased differently could get radically different results. For example, with Palm 2, let's think

00:05:37.500 | step-by-step, and then let's think about the semantics or meanings of the prompts. And then, let's think about the

00:05:38.140 | got 71.8. Whereas let's solve the problem together has accuracy of 60.5. But then if you put those

00:05:45.020 | two together and say let's work together to solve this problem step by step you only get 49.4.

00:05:51.980 | Although semantically its meaning is just a combination of those two instructions. For the

00:05:56.700 | original smart GPT I used this prompt. Let's work this out in a step-by-step way to be sure we have

00:06:01.580 | the right answer. That's because it performed best for GPT-4. As you can see here it doesn't perform

00:06:07.000 | best for palm 2. Although notice it does perform better than just an empty string. What does perform

00:06:12.760 | best? Well take a deep breath and work on this problem step by step. Also note the difference

00:06:18.180 | with beginning your answer with this prefix or beginning your question with a prefix. Anyway I

00:06:24.180 | am hoping to do a deeper dive on this paper with one of the authors so for now I'll leave it there.

00:06:29.800 | Suffice to say that prompt engineering is not a solved science yet. But what was the Gemini news

00:06:36.060 | that I promised you from Google? Well this was published just 14 hours ago in the information.

00:06:41.380 | Google has as of yesterday given a small group of companies access to an early version of Gemini.

00:06:47.560 | That is their direct competitor with OpenAI's GPT-4. According to a person who has tested it

00:06:53.500 | Gemini has an advantage over GPT-4 in at least one respect. The model leverages reams of Google's

00:06:59.460 | proprietary data from its consumer products in addition to public information scraped from the

00:07:05.080 | web. So the model should be able to do this. So let's take a look at the data. So the data is

00:07:06.040 | should be especially accurate with all of those Google search histories when it comes to

00:07:10.680 | understanding users intentions with particular queries. And apparently compared to GPT-4 it

00:07:16.540 | generates fewer incorrect answers known as hallucinations. And again according to them

00:07:21.720 | Gemini will feature vastly improved code generating abilities for software developers. Although note

00:07:27.560 | it says compared to its existing models. It didn't technically say compared to GPT-4. Note that Palm

00:07:34.060 | 2 didn't score particularly high for the model. So let's take a look at the data. So let's take a look

00:07:35.620 | for coding so if that's the baseline bear that in mind. The version they're giving developers

00:07:41.040 | isn't their largest version though which apparently will be on par with GPT-4. And in a first for this

00:07:47.580 | channel I'm going to make a direct prediction using Metaculous. I'm going to predict that there

00:07:53.080 | will indeed be at least three months of third-party safety evaluations conducted on Gemini before its

00:07:59.860 | deployment. I think they finished training the model sometime in summer so it will be more like

00:08:04.660 | six months if it's released in December. The heart of this channel is about understanding

00:08:09.460 | and navigating the future of AI so I am super proud that Metaculous are my first sponsors.

00:08:15.740 | They have aggregate forecasts on a range of AI related questions. Yes it's free to sign up with

00:08:23.020 | the link in the description so show them some love and say you came from AI Explained. Speaking of

00:08:29.240 | the future though we learned this week in the Wall Street Journal that Meta plans to develop

00:08:34.140 | Lama Theta and Meta's new technology. Meta is a new technology that will be available in the

00:08:34.640 | future. It will be available in the early days of the year and that will be Lama 3 sometime in early 2024. That will apparently be several times more powerful than Lama 2. Even more interesting to me though was this exchange at a recent Meta Gen I social. We have the compute to train Lama 3 and 4. The plan is for Lama 3 to be as good as GPT-4. Wow if Lama 3 is as good as GPT-4 will you guys still open source it? Yeah we will. Sorry alignment people. You can let me know in the comments what you think about that exchange. That is all for this video. I hope you enjoyed it and I will see you in the next one. Bye for now. Bye.

00:09:04.100 | That is of course in complete contravention of what Senators Blumenthal and Hawley have put out. This

00:09:10.440 | week they released the bipartisan framework for US AI Act. In it they actually mentioned deepfakes

00:09:16.520 | which I kind of showed you earlier with HeyGen. But they also focused on AI audits and establishing

00:09:22.480 | an oversight body that should have the authority to conduct audits of companies seeking licenses.

00:09:27.780 | But I suspect the people signing up to work in that auditing office will have to commit

00:09:33.600 | to not working for any of the AI companies for the rest of their lives. That's going to take a

00:09:39.040 | particularly motivated individual particularly on public sector pay levels. Why do I say that?

00:09:44.460 | Well here's Mustafa Suleiman. He recently said this on 80,000 hours.

00:09:49.460 | Well I'm really stuck. I think it's really hard. There is another direction which involves academic

00:09:55.440 | groups getting more access and either actually doing red teaming or doing audits of scale or

00:10:03.100 | audits of model capabilities. Right. They're the three proposals that I've heard made and I've been

00:10:08.520 | very supportive of and have certainly explored with people at Stanford and elsewhere. But I think

00:10:13.840 | there's a real problem there which is if you take the average PhD student or postdoctoral researcher

00:10:18.600 | that might work on this in a couple of years they may well go to a commercial lab. Right. And so if

00:10:25.500 | we're to give them access then they'll probably take that knowledge and expertise elsewhere

00:10:30.620 | potentially to a competitor. I mean it's an open.

00:10:32.840 | Labor market after all. And when we heard this week that the IRS in America are going to use AI

00:10:38.240 | to catch tax evasion it made me think that it's going to increasingly be a cat and mouse game

00:10:43.620 | between governments and auditors using AI on the one side and the companies developing the AI on

00:10:49.160 | the other side. If the IRS has an AI that can detect tax evasion well then a hedge fund can

00:10:54.680 | just make an AI to obscure that tax evasion. Seems to me that in all of this whoever has the most

00:11:00.300 | compute will win. And remember these won't just be the ones that are going to be the ones that are

00:11:02.820 | single modality language models anymore. To take one crazy example this week, we now have

00:11:08.020 | 'Smell to Text'. It's a much more narrow AI trained in a very different way to GPT models

00:11:13.860 | but it matches well with expert humans on novel smells. And then there's 'Protein Chat' which I

00:11:19.300 | didn't get a chance to talk about earlier in the year. The so-called 'Protein GPT' enables users

00:11:24.900 | to upload proteins, ask questions and engage in interactive conversations to gain insights.

00:11:30.580 | And if that's not enough modalities, how about this? This is 'Next GPT', a multimodal LLM released

00:11:36.980 | two days ago that can go from any modality to any modality. Obviously there should be an asterisk

00:11:42.980 | over 'any', it isn't quite 'any' yet, but we're talking about images, audio, video and then the

00:11:47.780 | output being images, audio, text, video. One obvious question is: do we want one model to

00:11:53.540 | be good at everything or do we want narrower AI that's good at individual tasks? And this links

00:11:59.940 | to 'Iron Man' which is a very common model that's used in AI training. And this links to 'Iron Man'

00:12:00.340 | and 'Iron Man' which is a very common model that's used in AI training. And this links to 'Iron Man'

00:12:00.420 | and 'Iron Man' which is a very common model that's used in AI training. And this links to 'Iron Man'

00:12:00.420 | Ajax GPT from Apple. Now, of course, I did watch the iPhone launch, but I find this more

00:12:06.180 | interesting. This was an exclusive for the information, and they talk about how Apple's

00:12:10.980 | LLM is designed to boost Siri. And it almost sounds to me like Open Interpreter, where you

00:12:16.700 | can automate tasks involving multiple steps. For example, telling Siri to create a GIF using the

00:12:22.920 | last five photos you've taken and text it to a friend. And this was the most interesting part

00:12:28.320 | of the piece for me. Earlier in the article, they talked about how they're spending millions of

00:12:32.660 | dollars a day on iXGPT. They're still quite far behind because apparently iXGPT beats GPT 3.5,

00:12:39.700 | the original ChatGPT, but not GPT 4. The focus is on running LLMs on your device with the goal

00:12:46.380 | of improving privacy and performance. So iXGPT might not be the best LLM, but they're pitching

00:12:52.440 | it as the best LLM on your phone. The model they have at the moment is apparently too big. It's 200

00:12:58.660 | billion parameters. Even a MacBook might struggle to run that, but they might have different sizes

00:13:04.040 | of iX, some small enough to run on an iPhone. Of course, from a user point of view, that would mean

00:13:10.480 | you can use the model offline, unlike, say, the ChatGPT app. Let's move on now to a lighter

00:13:16.460 | development, albeit one that might affect hundreds of millions of people, including my nephew.

00:13:21.580 | Apparently, the iXGPT is a little bit more expensive than the iXGPT, but it's still a

00:13:22.420 | the online game platform Roblox is bringing in a new AI chatbot. That's going to allow creators to

00:13:28.840 | build virtual worlds just by typing prompts. And that's the crazy thing. All of this is going to

00:13:34.340 | become intuitive to the next generation. Children today are just going to expect their apps to be

00:13:40.160 | interactive and customizable on demand. And yes, we have covered a lot today, so let me know what

00:13:45.860 | you think. I'm going to end with an AI image that has taken the internet by storm, as well as a few

00:13:52.400 | more things that I've been able to do to make it more accessible. And I'll see you in the next video.

00:13:56.340 | Thanks as always for watching to the end. Do check out Metaculous in the description. And as ever,

00:14:06.020 | have a wonderful day.

9 AI Developments: HeyGen 2.0 to AjaxGPT, Open Interpreter to NExT-GPT and Roblox AI

Chapters