back to index

The New Bard and Crazy AI Images, Videos, and Translations


Whisper Transcript | Transcript Only Page

00:00:00.000 | Just when I was going to do a video on the revolution that has happened in translation
00:00:04.980 | dubbing, along comes a new AI image technique that blew my mind. Got ready to film that,
00:00:11.320 | and then the new BARD comes out, which I have now tested in a dozen ways. I'm going to try to cover
00:00:17.200 | it all and show you how you can benefit, and I'll also cover how anyone with expertise in 26 diverse
00:00:23.960 | fields could contribute to red teaming what might become GPT-5. But let's start with BARD,
00:00:30.080 | where less than 24 hours ago we learned of BARD extensions. It's a bit like ChatGPT plugins,
00:00:36.380 | but it works specifically for YouTube, Gmail, Google Docs, Google Drive, etc. But if we cut
00:00:42.560 | out some of the hype, what would we actually use this for? Well, here's one use case that I will
00:00:47.300 | be using it for that I couldn't have done before. The image you can see is a photo I took while
00:00:52.040 | exploring some Roman remains.
00:00:53.940 | And where previously BARD could analyze the image and tell me a bit about Mithras,
00:00:58.860 | the god shown, it couldn't have done this. Recommend a YouTube video to learn more about
00:01:04.180 | this figure. Notice I didn't specify Mithras, it deduced just from the image that this was Mithras
00:01:09.820 | and then suggested a video for me. If this had been available while I was out exploring,
00:01:14.440 | I would have probably watched that video while looking through the ruins. I must say I really
00:01:19.720 | like the seamlessness of not having to separately analyze the image, but having to separately analyze
00:01:23.780 | the image and then go onto YouTube and search. Instead, it all happens in one window in one app.
00:01:29.280 | Likewise, imagine seeing a travel image you like and you want to know about flying there. This was
00:01:34.580 | again a photo I took while traveling recently, and I didn't separately ask it what location is this
00:01:40.080 | and how would I fly there from London. Instead, it was all one request, flight duration and typical
00:01:46.140 | costs from London to location shown. They actually gave me additional information about the Fisherman's
00:01:51.920 | Bastion and then found specific information about the Fisherman's Bastion. So I'm going to go into
00:01:53.760 | flights using Google Flights with the airlines and the typical costs. And I can say that these
00:02:00.020 | flight durations and timings are accurate. Let me show you one more cool thing before we get to some
00:02:05.360 | of the anti-hype. I asked Bard to search my Google Drive for the recent SmartGPT doc and said write a
00:02:12.700 | Shakespearean sonnet to summarize it. And you can almost imagine this happening for the minutes of
00:02:17.620 | a meeting you've recently attended or anything that might be a little dry that you want to spice
00:02:22.320 | up a little bit. Now, yes, this sonnet is a bit of a pain in the ass, but it's not a big deal. It's
00:02:23.740 | not a big deal. It's a very simple way to get a good idea of what's going on. The sonnet doesn't
00:02:24.240 | accurately follow the Shakespearean template, but it's really not bad. Look at this. From humble
00:02:29.100 | prompts, thou dost create such light. Thy creators, AI explained and Josh, with foresight keen,
00:02:35.420 | have harnessed thy power to solve the unseen. With chain of thought, research and reflection keen,
00:02:40.780 | thou dost answer questions with precision keen. However, we are back to the problem of
00:02:46.540 | hallucinations. If you can't fully rely on what Bard is saying, its use drops massively. Now,
00:02:53.600 | I have successfully got Bard to read a lot of my Gmail messages. It can do that. And of course,
00:02:59.680 | some of the messages I get are comments on my YouTube channel. But this time when I asked,
00:03:04.440 | it simply made up feedback that I hadn't gotten. Let's be honest, you guys know YouTube. Do any
00:03:10.160 | YouTube comments sound like any of these three? It seems very rare to me to get a YouTube comment
00:03:16.160 | that is so neutral and polite and calm. Of course, I quickly checked and no, such a comment didn't
00:03:22.400 | exist. This was a comment that I had to make. But I didn't get any of these three. I didn't get any
00:03:23.460 | of these three. I was Bard hallucinating. In fact, all three were. And I just want to quickly show you
00:03:27.720 | it can sometimes find genuine comments like this one. But the problem is, what is the utility of
00:03:33.700 | searching for something like that if you can't rely on it? Also, sometimes you have to remind
00:03:38.420 | it to do things that you know it can do. For example, here, it successfully went through my
00:03:42.700 | Gmail and found my Replit bill. But when I asked how much more is my Replit monthly bill than my
00:03:48.160 | Eleven Labs bill, it said it can't find anything. But then when I prompted again and again,
00:03:53.320 | it first found the receipt from Eleven Labs and then compared it. But it wasn't smooth and I would
00:03:59.320 | say at this point, not terribly useful. Likewise, this image recognition of Darth Vader was great,
00:04:05.720 | even without naming the character, it got it. But the figures for gross revenue were unreliable. And
00:04:11.700 | yes, I did try asking to adjust for inflation and it still got it wrong. Now you might say, what
00:04:17.080 | about that button where you can check the validity of an answer? And I have tried that out a few
00:04:22.200 | times. It's this button that you can press and it will tell you what you want to do. And it will
00:04:23.180 | tell you what you want to do. And then you can press this Google button here that says double-check
00:04:25.380 | response. After all, BARD powered by Palm II is supposed to be the only language model to
00:04:31.360 | actively admit how confident it is in its response. And while this did work to flag up that the gross
00:04:38.080 | revenue figure might be inaccurate, saying you should consider researching further, it doesn't
00:04:43.400 | normally flag up too much, to be honest. Here, there was just a single green line for one answer
00:04:48.760 | to this mathematical problem, which it got wrong. This, by the way, is a
00:04:53.040 | classic mathematical question that I have tested GPT-4 on many times, and it also normally gets it
00:04:58.680 | wrong. But remember that thing about BARD being super honest when it doesn't know an answer. I
00:05:03.220 | would have been really impressed if this was full of orange and red, saying I'm really not sure.
00:05:08.140 | And I even directly prompted BARD asking this, how confident are you in this response? And it said,
00:05:13.920 | I am very confident in my response. I then pointed out where it was wrong,
00:05:18.160 | and it did admit the error. Anyway, I don't want to be too harsh. I do think these extensions
00:05:22.900 | are a significant improvement for BARD. Imagine you're on holiday and you see a castle like this
00:05:28.960 | in the distance. All I had to do was take a photo and write YouTube this. It realized what the castle
00:05:35.380 | was and gave me YouTube videos for it. So if I was walking along, I could watch that getting hyped
00:05:41.040 | about the castle I'm about to see. It goes without saying that, of course, I have been following the
00:05:45.740 | news, but I am still working my way through the science paper. And it does seem like that it's more
00:05:51.840 | of a step forward. I don't know if it's a step forward, but I'm still working my way through
00:05:52.760 | the science paper. I'm not sure if it's a step forward than a sea change, as the article in
00:05:56.120 | Nature points out. Of course, every step forward is welcome, though, in the fight against things
00:06:00.760 | like cystic fibrosis and cancer. I'll have more interesting bits on this in the future if I can
00:06:06.060 | find them. But now we move on to HeyGen Avatar 2.0, which I briefly demoed in the last video,
00:06:12.420 | and it shocked a lot of people. I wanted to quickly show you some of its capabilities
00:06:17.020 | in other languages. And I think this kind of technology is not only going to revolutionize
00:06:22.620 | movie dubbing, it's also maybe as soon as next year going to make us question all historical
00:06:27.880 | footage we see. Let's start with Oppenheimer in Spanish.
00:06:31.660 | It was obvious that the world would not be the same. Only a few dared to laugh at the situation.
00:06:38.940 | Some people cried. Most were silent.
00:06:44.600 | I remembered the phrase from the sacred Hindu texts, the Bhagavad Gita 1, which says,
00:06:52.600 | Ahora me he convertido en la muerte, el destructor de mundos.
00:06:57.880 | Supongo que todos pensamos de una forma u otra.
00:07:02.180 | And now the famous network scene in German.
00:07:07.000 | Du musst sagen, dass ich ein Mensch bin, verdammt nochmal. Mein Leben ist wertvoll.
00:07:12.180 | Also möchte ich, dass du jetzt sofort aufstehst und handelst.
00:07:16.500 | Bitte steht alle von euren Stühlen auf. Du sollst jetzt sofort aufstehen.
00:07:22.460 | Zum Fenster gehen, es öffnen und deinen Kopf herausstrecken und schreien.
00:07:27.200 | Ich bin so wütend und werde das nicht länger hinnehmen.
00:07:30.760 | And now how about Andrej Karpathy in Hindi.
00:07:33.960 | Or who will make a humanoid robot on a large scale?
00:07:36.420 | This is a good form factor because the world is designed for a humanoid form factor.
00:07:41.200 | These things will be able to operate our machines. They will be able to sit in chairs.
00:07:46.720 | They will be able to drive cars. The world is designed for humans where you can invest in it.
00:07:52.440 | And we will be able to work with time.
00:07:54.440 | Or let's rewrite history and make Kennedy's famous speech in French.
00:07:59.440 | Because I have made a promise to you and the almighty God, the same Solomon that our ancestors prescribed nearly a century and a quarter ago.
00:08:13.440 | And two more quick ones before we move on. These are the last two languages it can do.
00:08:18.440 | Here's Liam Neeson in Polish.
00:08:20.440 | Umiejętności, które sprawiają, że jestem twoim koszmarem.
00:08:24.440 | Jeśli uwolnisz moją córkę, to koniec.
00:08:27.440 | Nie będę cię szukać. Nie będę cię ścigał.
00:08:31.440 | Jednak jeśli nie zdecydujesz się uciekać, będę cię szukać.
00:08:35.440 | Znajdę cię i zabiję, bez względu na to, jak bardzo będziesz się starać ukryć.
00:08:40.440 | Mi è venuto in mente qualcosa. Mi sono addormentato in un sonno tranquillo e non pensato più a te.
00:08:47.440 | Sai cosa mi è venuto in mente? No.
00:08:50.420 | Sei solo un bambino. Non hai idea di quello di cui parli. Grazie. Va bene. Non sei mai stato fuori da Boston.
00:09:00.420 | I should note that this only works for translation. You can't just get someone to say anything, not without a consent form.
00:09:07.420 | That's why I don't think it will be as early as next year when a politician in a major election is able to claim that they lost due to deepfake imagery.
00:09:17.420 | I think it will be very close and I could see it happening in 2020.
00:09:20.400 | 2025 but not 2024. The guardrails are still quite tight and the technology isn't quite good enough.
00:09:27.500 | I'm of course predicting this on the wonderful Metaculous website. Not only are they sponsoring
00:09:32.840 | the video but it's actually free to sign up using the link in the description. As I mentioned in my
00:09:38.600 | last video you can see the aggregated forecast of hundreds of AI questions and I think my audience
00:09:44.680 | is going to be better informed than average so do sign up and start making your own forecasts.
00:09:50.520 | I genuinely want our community to be among the best informed on the planet in AI progress.
00:09:56.600 | Speaking of which I think many of you might be genuinely interested in some of the guests
00:10:01.100 | and topics I've got lined up with one of the most famous professors who argues that GPT models can't
00:10:06.840 | reason versus the recent news that GPT 3.5 instruct can play chess and you may even hear the result of
00:10:14.020 | my game.
00:10:14.660 | against GPT 3.5. But now it's time for this the inventive AI image generation you saw at the start
00:10:22.060 | of the video. I'm going to show you not only how you can create it yourself but also how you can
00:10:27.140 | turn it into a video. Hopefully you can see the AI explained in this image and maybe you like a bit
00:10:33.420 | of movement like this. Here are some of the other generations if you prefer things to be a little
00:10:38.940 | bolder. And can you spot the letters AGI in this somewhat quirky video?
00:10:44.000 | I just love how two of the characters just yeet their way out of the video randomly.
00:10:48.500 | There's two quick tools I want to show you. The first one is fusion art where you get 25 free credits.
00:10:55.080 | What I've been doing is to create bold text in Adobe Express. I download the output and then click
00:11:02.080 | upload black and white image. I've changed it to 16 by 9 and then you can basically get creative
00:11:08.120 | with the prompt. You can see one of the other generations I came up with here. Then what I did
00:11:13.340 | is I took those images to Runway Gen 2. You can sign up for free and then you get 45 seconds of
00:11:20.060 | generation. You can see here what some other people have been doing. This is from Stealthy
00:11:24.920 | the Time Traveler. This by the way is the free and unlimited hugging face space that you can
00:11:33.200 | use alternatively. And let me just give you some tips before I move on. The higher up
00:11:37.820 | you make the illusion strength the more you can see the original illusion. In this case
00:11:42.680 | that might be the letters AI explained or in other cases it might be the background shapes or spirals.
00:11:48.740 | But one big benefit of reducing the illusion strength is that you get more interesting
00:11:53.420 | lifelike images. If you want different outputs you can also change the prompt of course. Or if
00:11:58.640 | you want to keep the prompt the same and get different outputs just click on advanced options
00:12:03.740 | and just randomly change the seed and you'll get a different output. If you ever find the
00:12:08.840 | resolution a little low there are free websites where you can upscale it. If you want to change the
00:12:12.020 | resolution you can also change the font size and upscale images linked in the description. And
00:12:14.900 | this kind of stuff is only going to get crazier when it moves into the world of 3D as you can see
00:12:20.420 | here. But if you find testing for dangerous capabilities more compelling than generating
00:12:25.340 | AI images how about the OpenAI red teaming network announced less than 24 hours ago.
00:12:31.100 | Essentially if you are a domain expert in any of the 26 areas I'm about to show you,
00:12:36.560 | you can join up and get paid. And that's even if you can commit to as little as
00:12:41.360 | 5-10 hours in a year. Honestly if I didn't have a bit of a conflict of interest with the channel I
00:12:47.180 | would definitely have signed up to this. Getting paid to get early access to what could be GPT-5
00:12:52.520 | and red team it seems like a sweet deal to me. If you are a subject matter in any of these areas
00:12:58.520 | then do check out the link in the description. At the very least it should be highly interesting
00:13:04.100 | work. That's it for today a lot of diverse topics covered and a lot more to come. Do let me know in
00:13:09.740 | the comments if you found any of this video helpful. If you found this video helpful please
00:13:10.700 | like, share and subscribe. As always have a wonderful day.