The New Bard and Crazy AI Images, Videos, and Translations

00:00:00.000 | Just when I was going to do a video on the revolution that has happened in translation

00:00:04.980 | dubbing, along comes a new AI image technique that blew my mind. Got ready to film that,

00:00:11.320 | and then the new BARD comes out, which I have now tested in a dozen ways. I'm going to try to cover

00:00:17.200 | it all and show you how you can benefit, and I'll also cover how anyone with expertise in 26 diverse

00:00:23.960 | fields could contribute to red teaming what might become GPT-5. But let's start with BARD,

00:00:30.080 | where less than 24 hours ago we learned of BARD extensions. It's a bit like ChatGPT plugins,

00:00:36.380 | but it works specifically for YouTube, Gmail, Google Docs, Google Drive, etc. But if we cut

00:00:42.560 | out some of the hype, what would we actually use this for? Well, here's one use case that I will

00:00:47.300 | be using it for that I couldn't have done before. The image you can see is a photo I took while

00:00:52.040 | exploring some Roman remains.

00:00:53.940 | And where previously BARD could analyze the image and tell me a bit about Mithras,

00:00:58.860 | the god shown, it couldn't have done this. Recommend a YouTube video to learn more about

00:01:04.180 | this figure. Notice I didn't specify Mithras, it deduced just from the image that this was Mithras

00:01:09.820 | and then suggested a video for me. If this had been available while I was out exploring,

00:01:14.440 | I would have probably watched that video while looking through the ruins. I must say I really

00:01:19.720 | like the seamlessness of not having to separately analyze the image, but having to separately analyze

00:01:23.780 | the image and then go onto YouTube and search. Instead, it all happens in one window in one app.

00:01:29.280 | Likewise, imagine seeing a travel image you like and you want to know about flying there. This was

00:01:34.580 | again a photo I took while traveling recently, and I didn't separately ask it what location is this

00:01:40.080 | and how would I fly there from London. Instead, it was all one request, flight duration and typical

00:01:46.140 | costs from London to location shown. They actually gave me additional information about the Fisherman's

00:01:51.920 | Bastion and then found specific information about the Fisherman's Bastion. So I'm going to go into

00:01:53.760 | flights using Google Flights with the airlines and the typical costs. And I can say that these

00:02:00.020 | flight durations and timings are accurate. Let me show you one more cool thing before we get to some

00:02:05.360 | of the anti-hype. I asked Bard to search my Google Drive for the recent SmartGPT doc and said write a

00:02:12.700 | Shakespearean sonnet to summarize it. And you can almost imagine this happening for the minutes of

00:02:17.620 | a meeting you've recently attended or anything that might be a little dry that you want to spice

00:02:22.320 | up a little bit. Now, yes, this sonnet is a bit of a pain in the ass, but it's not a big deal. It's

00:02:23.740 | not a big deal. It's a very simple way to get a good idea of what's going on. The sonnet doesn't

00:02:24.240 | accurately follow the Shakespearean template, but it's really not bad. Look at this. From humble

00:02:29.100 | prompts, thou dost create such light. Thy creators, AI explained and Josh, with foresight keen,

00:02:35.420 | have harnessed thy power to solve the unseen. With chain of thought, research and reflection keen,

00:02:40.780 | thou dost answer questions with precision keen. However, we are back to the problem of

00:02:46.540 | hallucinations. If you can't fully rely on what Bard is saying, its use drops massively. Now,

00:02:53.600 | I have successfully got Bard to read a lot of my Gmail messages. It can do that. And of course,

00:02:59.680 | some of the messages I get are comments on my YouTube channel. But this time when I asked,

00:03:04.440 | it simply made up feedback that I hadn't gotten. Let's be honest, you guys know YouTube. Do any

00:03:10.160 | YouTube comments sound like any of these three? It seems very rare to me to get a YouTube comment

00:03:16.160 | that is so neutral and polite and calm. Of course, I quickly checked and no, such a comment didn't

00:03:22.400 | exist. This was a comment that I had to make. But I didn't get any of these three. I didn't get any

00:03:23.460 | of these three. I was Bard hallucinating. In fact, all three were. And I just want to quickly show you

00:03:27.720 | it can sometimes find genuine comments like this one. But the problem is, what is the utility of

00:03:33.700 | searching for something like that if you can't rely on it? Also, sometimes you have to remind

00:03:38.420 | it to do things that you know it can do. For example, here, it successfully went through my

00:03:42.700 | Gmail and found my Replit bill. But when I asked how much more is my Replit monthly bill than my

00:03:48.160 | Eleven Labs bill, it said it can't find anything. But then when I prompted again and again,

00:03:53.320 | it first found the receipt from Eleven Labs and then compared it. But it wasn't smooth and I would

00:03:59.320 | say at this point, not terribly useful. Likewise, this image recognition of Darth Vader was great,

00:04:05.720 | even without naming the character, it got it. But the figures for gross revenue were unreliable. And

00:04:11.700 | yes, I did try asking to adjust for inflation and it still got it wrong. Now you might say, what

00:04:17.080 | about that button where you can check the validity of an answer? And I have tried that out a few

00:04:22.200 | times. It's this button that you can press and it will tell you what you want to do. And it will

00:04:23.180 | tell you what you want to do. And then you can press this Google button here that says double-check

00:04:25.380 | response. After all, BARD powered by Palm II is supposed to be the only language model to

00:04:31.360 | actively admit how confident it is in its response. And while this did work to flag up that the gross

00:04:38.080 | revenue figure might be inaccurate, saying you should consider researching further, it doesn't

00:04:43.400 | normally flag up too much, to be honest. Here, there was just a single green line for one answer

00:04:48.760 | to this mathematical problem, which it got wrong. This, by the way, is a

00:04:53.040 | classic mathematical question that I have tested GPT-4 on many times, and it also normally gets it

00:04:58.680 | wrong. But remember that thing about BARD being super honest when it doesn't know an answer. I

00:05:03.220 | would have been really impressed if this was full of orange and red, saying I'm really not sure.

00:05:08.140 | And I even directly prompted BARD asking this, how confident are you in this response? And it said,

00:05:13.920 | I am very confident in my response. I then pointed out where it was wrong,

00:05:18.160 | and it did admit the error. Anyway, I don't want to be too harsh. I do think these extensions

00:05:22.900 | are a significant improvement for BARD. Imagine you're on holiday and you see a castle like this

00:05:28.960 | in the distance. All I had to do was take a photo and write YouTube this. It realized what the castle

00:05:35.380 | was and gave me YouTube videos for it. So if I was walking along, I could watch that getting hyped

00:05:41.040 | about the castle I'm about to see. It goes without saying that, of course, I have been following the

00:05:45.740 | news, but I am still working my way through the science paper. And it does seem like that it's more

00:05:51.840 | of a step forward. I don't know if it's a step forward, but I'm still working my way through

00:05:52.760 | the science paper. I'm not sure if it's a step forward than a sea change, as the article in

00:05:56.120 | Nature points out. Of course, every step forward is welcome, though, in the fight against things

00:06:00.760 | like cystic fibrosis and cancer. I'll have more interesting bits on this in the future if I can

00:06:06.060 | find them. But now we move on to HeyGen Avatar 2.0, which I briefly demoed in the last video,

00:06:12.420 | and it shocked a lot of people. I wanted to quickly show you some of its capabilities

00:06:17.020 | in other languages. And I think this kind of technology is not only going to revolutionize

00:06:22.620 | movie dubbing, it's also maybe as soon as next year going to make us question all historical

00:06:27.880 | footage we see. Let's start with Oppenheimer in Spanish.

00:06:31.660 | It was obvious that the world would not be the same. Only a few dared to laugh at the situation.

00:06:38.940 | Some people cried. Most were silent.

00:06:44.600 | I remembered the phrase from the sacred Hindu texts, the Bhagavad Gita 1, which says,

00:06:52.600 | Ahora me he convertido en la muerte, el destructor de mundos.

00:06:57.880 | Supongo que todos pensamos de una forma u otra.

00:07:02.180 | And now the famous network scene in German.

00:07:07.000 | Du musst sagen, dass ich ein Mensch bin, verdammt nochmal. Mein Leben ist wertvoll.

00:07:12.180 | Also möchte ich, dass du jetzt sofort aufstehst und handelst.

00:07:16.500 | Bitte steht alle von euren Stühlen auf. Du sollst jetzt sofort aufstehen.

00:07:22.460 | Zum Fenster gehen, es öffnen und deinen Kopf herausstrecken und schreien.

00:07:27.200 | Ich bin so wütend und werde das nicht länger hinnehmen.

00:07:30.760 | And now how about Andrej Karpathy in Hindi.

00:07:33.960 | Or who will make a humanoid robot on a large scale?

00:07:36.420 | This is a good form factor because the world is designed for a humanoid form factor.

00:07:41.200 | These things will be able to operate our machines. They will be able to sit in chairs.

00:07:46.720 | They will be able to drive cars. The world is designed for humans where you can invest in it.

00:07:52.440 | And we will be able to work with time.

00:07:54.440 | Or let's rewrite history and make Kennedy's famous speech in French.

00:07:59.440 | Because I have made a promise to you and the almighty God, the same Solomon that our ancestors prescribed nearly a century and a quarter ago.

00:08:13.440 | And two more quick ones before we move on. These are the last two languages it can do.

00:08:18.440 | Here's Liam Neeson in Polish.

00:08:20.440 | Umiejętności, które sprawiają, że jestem twoim koszmarem.

00:08:24.440 | Jeśli uwolnisz moją córkę, to koniec.

00:08:27.440 | Nie będę cię szukać. Nie będę cię ścigał.

00:08:31.440 | Jednak jeśli nie zdecydujesz się uciekać, będę cię szukać.

00:08:35.440 | Znajdę cię i zabiję, bez względu na to, jak bardzo będziesz się starać ukryć.

00:08:40.440 | Mi è venuto in mente qualcosa. Mi sono addormentato in un sonno tranquillo e non pensato più a te.

00:08:47.440 | Sai cosa mi è venuto in mente? No.

00:08:50.420 | Sei solo un bambino. Non hai idea di quello di cui parli. Grazie. Va bene. Non sei mai stato fuori da Boston.

00:09:00.420 | I should note that this only works for translation. You can't just get someone to say anything, not without a consent form.

00:09:07.420 | That's why I don't think it will be as early as next year when a politician in a major election is able to claim that they lost due to deepfake imagery.

00:09:17.420 | I think it will be very close and I could see it happening in 2020.

00:09:20.400 | 2025 but not 2024. The guardrails are still quite tight and the technology isn't quite good enough.

00:09:27.500 | I'm of course predicting this on the wonderful Metaculous website. Not only are they sponsoring

00:09:32.840 | the video but it's actually free to sign up using the link in the description. As I mentioned in my

00:09:38.600 | last video you can see the aggregated forecast of hundreds of AI questions and I think my audience

00:09:44.680 | is going to be better informed than average so do sign up and start making your own forecasts.

00:09:50.520 | I genuinely want our community to be among the best informed on the planet in AI progress.

00:09:56.600 | Speaking of which I think many of you might be genuinely interested in some of the guests

00:10:01.100 | and topics I've got lined up with one of the most famous professors who argues that GPT models can't

00:10:06.840 | reason versus the recent news that GPT 3.5 instruct can play chess and you may even hear the result of

00:10:14.020 | my game.

00:10:14.660 | against GPT 3.5. But now it's time for this the inventive AI image generation you saw at the start

00:10:22.060 | of the video. I'm going to show you not only how you can create it yourself but also how you can

00:10:27.140 | turn it into a video. Hopefully you can see the AI explained in this image and maybe you like a bit

00:10:33.420 | of movement like this. Here are some of the other generations if you prefer things to be a little

00:10:38.940 | bolder. And can you spot the letters AGI in this somewhat quirky video?

00:10:44.000 | I just love how two of the characters just yeet their way out of the video randomly.

00:10:48.500 | There's two quick tools I want to show you. The first one is fusion art where you get 25 free credits.

00:10:55.080 | What I've been doing is to create bold text in Adobe Express. I download the output and then click

00:11:02.080 | upload black and white image. I've changed it to 16 by 9 and then you can basically get creative

00:11:08.120 | with the prompt. You can see one of the other generations I came up with here. Then what I did

00:11:13.340 | is I took those images to Runway Gen 2. You can sign up for free and then you get 45 seconds of

00:11:20.060 | generation. You can see here what some other people have been doing. This is from Stealthy

00:11:24.920 | the Time Traveler. This by the way is the free and unlimited hugging face space that you can

00:11:33.200 | use alternatively. And let me just give you some tips before I move on. The higher up

00:11:37.820 | you make the illusion strength the more you can see the original illusion. In this case

00:11:42.680 | that might be the letters AI explained or in other cases it might be the background shapes or spirals.

00:11:48.740 | But one big benefit of reducing the illusion strength is that you get more interesting

00:11:53.420 | lifelike images. If you want different outputs you can also change the prompt of course. Or if

00:11:58.640 | you want to keep the prompt the same and get different outputs just click on advanced options

00:12:03.740 | and just randomly change the seed and you'll get a different output. If you ever find the

00:12:08.840 | resolution a little low there are free websites where you can upscale it. If you want to change the

00:12:12.020 | resolution you can also change the font size and upscale images linked in the description. And

00:12:14.900 | this kind of stuff is only going to get crazier when it moves into the world of 3D as you can see

00:12:20.420 | here. But if you find testing for dangerous capabilities more compelling than generating

00:12:25.340 | AI images how about the OpenAI red teaming network announced less than 24 hours ago.

00:12:31.100 | Essentially if you are a domain expert in any of the 26 areas I'm about to show you,

00:12:36.560 | you can join up and get paid. And that's even if you can commit to as little as

00:12:41.360 | 5-10 hours in a year. Honestly if I didn't have a bit of a conflict of interest with the channel I

00:12:47.180 | would definitely have signed up to this. Getting paid to get early access to what could be GPT-5

00:12:52.520 | and red team it seems like a sweet deal to me. If you are a subject matter in any of these areas

00:12:58.520 | then do check out the link in the description. At the very least it should be highly interesting

00:13:04.100 | work. That's it for today a lot of diverse topics covered and a lot more to come. Do let me know in

00:13:09.740 | the comments if you found any of this video helpful. If you found this video helpful please

00:13:10.700 | like, share and subscribe. As always have a wonderful day.