back to index9 AI Developments: HeyGen 2.0 to AjaxGPT, Open Interpreter to NExT-GPT and Roblox AI
Chapters
0:0 Intro
3:31 Optimized Prompts
6:34 Gemini News
8:28 Llama
11:59 Apple AjaxGPT
00:00:00.000 |
There were nine impactful AI developments in the last few days that I wanted to tell you guys about. 00:00:06.260 |
From the frankly startling HeyGen video translation, to the epic new prompt optimising paper, 00:00:13.240 |
and from Apple's IAX GPT to Open Interpreter, Next GPT, yet more Google Gemini news, 00:00:20.340 |
and even Roblox AI. But I must start with HeyGen. 00:00:25.000 |
You probably already heard from AI Explained that HeyGen can generate lifelike videos, 00:00:30.820 |
and is available as a plugin to ChatGPT. But how about video language dubbing? 00:00:36.000 |
Well, today I got access to their new Avatar 2.0 feature, and I decided to test it out with 00:00:43.560 |
Sam Altman. Not with the real Sam Altman, of course, but with his testimony to the Senate. 00:00:48.540 |
And I want Spanish language speakers to tell me how accurate they think this is. 00:00:53.380 |
My worst fears are that we are causing significant damage to the field, technology, industry, and the world. 00:00:59.960 |
I think that could happen in various ways. That's why we started the company. 00:01:05.320 |
It's a big part of why I'm here today, and we've been able to spend time with you. 00:01:10.620 |
If this technology fails, it can be disastrous. We want to be clear about our position on this. 00:01:16.900 |
I have been researching three or four tools, including this one, 00:01:20.660 |
to translate my videos into dozens of languages. 00:01:25.980 |
But time waits for no man, so let me move on to Open Interpreter. 00:01:30.580 |
I've been using the version released five days ago. 00:01:33.560 |
What is it? Well, an open source code interpreter. Here is a brief preview. 00:02:06.760 |
Of course, I've been trying to figure out how to use Open Interpreter. I've been trying to figure out how to use Open Interpreter. 00:02:08.080 |
I've been trying it out intensively, and while it's not perfect, it has proven useful. 00:02:14.280 |
Download this YouTube video in 1440p, e.g. using PyTube, and clip out 2318, 2338, save to desktop, naming file Altman vertical line. 00:02:26.700 |
That's a clip I wanted to use in this very video. 00:02:30.000 |
Now, okay, it wouldn't have taken me that long to do it manually, but this process was a few seconds. 00:02:36.640 |
You agreed to run the code a few times, and here was the end result. 00:02:41.000 |
You can try to guess why I picked out this clip from a recent Sam Altman interview. 00:02:54.200 |
I think it maybe took like a little bit of a flat line during the summer, which happens for lots of products, but it is... 00:03:07.300 |
Obviously, we shouldn't automatically believe the CEO of the company about how many people are using his product. 00:03:14.380 |
But I do think it points to a counter-narrative to the argument that Chatipati usage is continuing to slow down into the autumn. 00:03:21.940 |
You don't have to use GPT-4 either, and more generally, I think this points to a change in the way we will use computers in the near future. 00:03:30.900 |
And now let's talk about this paper from Google DeepMind. 00:03:35.020 |
I'll talk about the Gemini model in a second, but first, I found this paper fascinating. 00:03:39.140 |
I will hopefully be doing a deeper dive with one of the authors, but for now, the big picture is this. 00:03:44.980 |
Language models can come up with optimized prompts for language models. 00:03:49.580 |
These aren't small optimizations either, and nor do they work with only one model. 00:03:54.620 |
The paper says that with a variety of large language models, we demonstrate that the best prompts optimized by their method outperform human design prompts by up to 8% or more. 00:04:04.780 |
The paper says that with a variety of large language models, we demonstrate that the best prompts optimized by their method outperform human design prompts by up to 8% or more. 00:04:04.820 |
The paper says that with a variety of large language models, we demonstrate that the best prompts optimized by their method outperform human design prompts by up to 8% or more. 00:04:04.860 |
The paper says that with a variety of large language models, we demonstrate that the best prompts optimized by their method outperform human design prompts by up to 8% or more. 00:04:04.860 |
The paper says that with a variety of large language models, we demonstrate that the best prompts optimized by their method outperform human design prompts by up to 8% or more. 00:04:04.940 |
The paper says that with a variety of large language models, we demonstrate that the best prompts optimized by their method outperform human design prompts by up to 8% or more. 00:04:09.100 |
bench hard tasks. Those are long-standing tasks known for their difficulty for large language 00:04:14.760 |
models. To massively oversimplify, models like Palm 2 and GPT-4 can be given a meta prompt. For 00:04:21.120 |
example, generate a new instruction that achieves a higher accuracy on a particular task. The 00:04:26.360 |
language models are then shown how previous prompts worked out. In this example, for a particular task, 00:04:31.880 |
let's figure it out scored 61, while let's solve the problem scored 63 out of 100. This was the 00:04:38.280 |
mathematics problem down here. And then they're asked, generate an instruction that is different 00:04:43.080 |
from all the instructions above and has a higher score than all the instructions above. The 00:04:48.180 |
instruction should be concise, effective, and generally applicable to all problems. And apparently, 00:04:53.400 |
GPT-4 was particularly good at looking at the trajectory of optimizations, the patterns and 00:05:00.240 |
trends about what produced better prompts on a particular task. For example, you might start with 00:05:08.160 |
And then the language model would propose iterations like let's think carefully about the problem and 00:05:14.260 |
solve it together. That got 63.2 and you can see the accuracy gradually going up. Apparently, at least 00:05:20.300 |
for math problems, Palm 2 preferred concise prompts, while GPT models liked ones that were long and 00:05:26.640 |
detailed. And nor was it just about the semantics or meanings of the prompts. The same meanings 00:05:32.260 |
phrased differently could get radically different results. For example, with Palm 2, let's think 00:05:37.500 |
step-by-step, and then let's think about the semantics or meanings of the prompts. And then, let's think about the 00:05:38.140 |
got 71.8. Whereas let's solve the problem together has accuracy of 60.5. But then if you put those 00:05:45.020 |
two together and say let's work together to solve this problem step by step you only get 49.4. 00:05:51.980 |
Although semantically its meaning is just a combination of those two instructions. For the 00:05:56.700 |
original smart GPT I used this prompt. Let's work this out in a step-by-step way to be sure we have 00:06:01.580 |
the right answer. That's because it performed best for GPT-4. As you can see here it doesn't perform 00:06:07.000 |
best for palm 2. Although notice it does perform better than just an empty string. What does perform 00:06:12.760 |
best? Well take a deep breath and work on this problem step by step. Also note the difference 00:06:18.180 |
with beginning your answer with this prefix or beginning your question with a prefix. Anyway I 00:06:24.180 |
am hoping to do a deeper dive on this paper with one of the authors so for now I'll leave it there. 00:06:29.800 |
Suffice to say that prompt engineering is not a solved science yet. But what was the Gemini news 00:06:36.060 |
that I promised you from Google? Well this was published just 14 hours ago in the information. 00:06:41.380 |
Google has as of yesterday given a small group of companies access to an early version of Gemini. 00:06:47.560 |
That is their direct competitor with OpenAI's GPT-4. According to a person who has tested it 00:06:53.500 |
Gemini has an advantage over GPT-4 in at least one respect. The model leverages reams of Google's 00:06:59.460 |
proprietary data from its consumer products in addition to public information scraped from the 00:07:05.080 |
web. So the model should be able to do this. So let's take a look at the data. So the data is 00:07:06.040 |
should be especially accurate with all of those Google search histories when it comes to 00:07:10.680 |
understanding users intentions with particular queries. And apparently compared to GPT-4 it 00:07:16.540 |
generates fewer incorrect answers known as hallucinations. And again according to them 00:07:21.720 |
Gemini will feature vastly improved code generating abilities for software developers. Although note 00:07:27.560 |
it says compared to its existing models. It didn't technically say compared to GPT-4. Note that Palm 00:07:34.060 |
2 didn't score particularly high for the model. So let's take a look at the data. So let's take a look 00:07:35.620 |
for coding so if that's the baseline bear that in mind. The version they're giving developers 00:07:41.040 |
isn't their largest version though which apparently will be on par with GPT-4. And in a first for this 00:07:47.580 |
channel I'm going to make a direct prediction using Metaculous. I'm going to predict that there 00:07:53.080 |
will indeed be at least three months of third-party safety evaluations conducted on Gemini before its 00:07:59.860 |
deployment. I think they finished training the model sometime in summer so it will be more like 00:08:04.660 |
six months if it's released in December. The heart of this channel is about understanding 00:08:09.460 |
and navigating the future of AI so I am super proud that Metaculous are my first sponsors. 00:08:15.740 |
They have aggregate forecasts on a range of AI related questions. Yes it's free to sign up with 00:08:23.020 |
the link in the description so show them some love and say you came from AI Explained. Speaking of 00:08:29.240 |
the future though we learned this week in the Wall Street Journal that Meta plans to develop 00:08:34.140 |
Lama Theta and Meta's new technology. Meta is a new technology that will be available in the 00:08:34.640 |
future. It will be available in the early days of the year and that will be Lama 3 sometime in early 2024. That will apparently be several times more powerful than Lama 2. Even more interesting to me though was this exchange at a recent Meta Gen I social. We have the compute to train Lama 3 and 4. The plan is for Lama 3 to be as good as GPT-4. Wow if Lama 3 is as good as GPT-4 will you guys still open source it? Yeah we will. Sorry alignment people. You can let me know in the comments what you think about that exchange. That is all for this video. I hope you enjoyed it and I will see you in the next one. Bye for now. Bye. 00:09:04.100 |
That is of course in complete contravention of what Senators Blumenthal and Hawley have put out. This 00:09:10.440 |
week they released the bipartisan framework for US AI Act. In it they actually mentioned deepfakes 00:09:16.520 |
which I kind of showed you earlier with HeyGen. But they also focused on AI audits and establishing 00:09:22.480 |
an oversight body that should have the authority to conduct audits of companies seeking licenses. 00:09:27.780 |
But I suspect the people signing up to work in that auditing office will have to commit 00:09:33.600 |
to not working for any of the AI companies for the rest of their lives. That's going to take a 00:09:39.040 |
particularly motivated individual particularly on public sector pay levels. Why do I say that? 00:09:44.460 |
Well here's Mustafa Suleiman. He recently said this on 80,000 hours. 00:09:49.460 |
Well I'm really stuck. I think it's really hard. There is another direction which involves academic 00:09:55.440 |
groups getting more access and either actually doing red teaming or doing audits of scale or 00:10:03.100 |
audits of model capabilities. Right. They're the three proposals that I've heard made and I've been 00:10:08.520 |
very supportive of and have certainly explored with people at Stanford and elsewhere. But I think 00:10:13.840 |
there's a real problem there which is if you take the average PhD student or postdoctoral researcher 00:10:18.600 |
that might work on this in a couple of years they may well go to a commercial lab. Right. And so if 00:10:25.500 |
we're to give them access then they'll probably take that knowledge and expertise elsewhere 00:10:30.620 |
potentially to a competitor. I mean it's an open. 00:10:32.840 |
Labor market after all. And when we heard this week that the IRS in America are going to use AI 00:10:38.240 |
to catch tax evasion it made me think that it's going to increasingly be a cat and mouse game 00:10:43.620 |
between governments and auditors using AI on the one side and the companies developing the AI on 00:10:49.160 |
the other side. If the IRS has an AI that can detect tax evasion well then a hedge fund can 00:10:54.680 |
just make an AI to obscure that tax evasion. Seems to me that in all of this whoever has the most 00:11:00.300 |
compute will win. And remember these won't just be the ones that are going to be the ones that are 00:11:02.820 |
single modality language models anymore. To take one crazy example this week, we now have 00:11:08.020 |
'Smell to Text'. It's a much more narrow AI trained in a very different way to GPT models 00:11:13.860 |
but it matches well with expert humans on novel smells. And then there's 'Protein Chat' which I 00:11:19.300 |
didn't get a chance to talk about earlier in the year. The so-called 'Protein GPT' enables users 00:11:24.900 |
to upload proteins, ask questions and engage in interactive conversations to gain insights. 00:11:30.580 |
And if that's not enough modalities, how about this? This is 'Next GPT', a multimodal LLM released 00:11:36.980 |
two days ago that can go from any modality to any modality. Obviously there should be an asterisk 00:11:42.980 |
over 'any', it isn't quite 'any' yet, but we're talking about images, audio, video and then the 00:11:47.780 |
output being images, audio, text, video. One obvious question is: do we want one model to 00:11:53.540 |
be good at everything or do we want narrower AI that's good at individual tasks? And this links 00:11:59.940 |
to 'Iron Man' which is a very common model that's used in AI training. And this links to 'Iron Man' 00:12:00.340 |
and 'Iron Man' which is a very common model that's used in AI training. And this links to 'Iron Man' 00:12:00.420 |
and 'Iron Man' which is a very common model that's used in AI training. And this links to 'Iron Man' 00:12:00.420 |
Ajax GPT from Apple. Now, of course, I did watch the iPhone launch, but I find this more 00:12:06.180 |
interesting. This was an exclusive for the information, and they talk about how Apple's 00:12:10.980 |
LLM is designed to boost Siri. And it almost sounds to me like Open Interpreter, where you 00:12:16.700 |
can automate tasks involving multiple steps. For example, telling Siri to create a GIF using the 00:12:22.920 |
last five photos you've taken and text it to a friend. And this was the most interesting part 00:12:28.320 |
of the piece for me. Earlier in the article, they talked about how they're spending millions of 00:12:32.660 |
dollars a day on iXGPT. They're still quite far behind because apparently iXGPT beats GPT 3.5, 00:12:39.700 |
the original ChatGPT, but not GPT 4. The focus is on running LLMs on your device with the goal 00:12:46.380 |
of improving privacy and performance. So iXGPT might not be the best LLM, but they're pitching 00:12:52.440 |
it as the best LLM on your phone. The model they have at the moment is apparently too big. It's 200 00:12:58.660 |
billion parameters. Even a MacBook might struggle to run that, but they might have different sizes 00:13:04.040 |
of iX, some small enough to run on an iPhone. Of course, from a user point of view, that would mean 00:13:10.480 |
you can use the model offline, unlike, say, the ChatGPT app. Let's move on now to a lighter 00:13:16.460 |
development, albeit one that might affect hundreds of millions of people, including my nephew. 00:13:21.580 |
Apparently, the iXGPT is a little bit more expensive than the iXGPT, but it's still a 00:13:22.420 |
the online game platform Roblox is bringing in a new AI chatbot. That's going to allow creators to 00:13:28.840 |
build virtual worlds just by typing prompts. And that's the crazy thing. All of this is going to 00:13:34.340 |
become intuitive to the next generation. Children today are just going to expect their apps to be 00:13:40.160 |
interactive and customizable on demand. And yes, we have covered a lot today, so let me know what 00:13:45.860 |
you think. I'm going to end with an AI image that has taken the internet by storm, as well as a few 00:13:52.400 |
more things that I've been able to do to make it more accessible. And I'll see you in the next video. 00:13:56.340 |
Thanks as always for watching to the end. Do check out Metaculous in the description. And as ever,