Back to Index

The New Bard and Crazy AI Images, Videos, and Translations


Transcript

Just when I was going to do a video on the revolution that has happened in translation dubbing, along comes a new AI image technique that blew my mind. Got ready to film that, and then the new BARD comes out, which I have now tested in a dozen ways. I'm going to try to cover it all and show you how you can benefit, and I'll also cover how anyone with expertise in 26 diverse fields could contribute to red teaming what might become GPT-5.

But let's start with BARD, where less than 24 hours ago we learned of BARD extensions. It's a bit like ChatGPT plugins, but it works specifically for YouTube, Gmail, Google Docs, Google Drive, etc. But if we cut out some of the hype, what would we actually use this for? Well, here's one use case that I will be using it for that I couldn't have done before.

The image you can see is a photo I took while exploring some Roman remains. And where previously BARD could analyze the image and tell me a bit about Mithras, the god shown, it couldn't have done this. Recommend a YouTube video to learn more about this figure. Notice I didn't specify Mithras, it deduced just from the image that this was Mithras and then suggested a video for me.

If this had been available while I was out exploring, I would have probably watched that video while looking through the ruins. I must say I really like the seamlessness of not having to separately analyze the image, but having to separately analyze the image and then go onto YouTube and search.

Instead, it all happens in one window in one app. Likewise, imagine seeing a travel image you like and you want to know about flying there. This was again a photo I took while traveling recently, and I didn't separately ask it what location is this and how would I fly there from London.

Instead, it was all one request, flight duration and typical costs from London to location shown. They actually gave me additional information about the Fisherman's Bastion and then found specific information about the Fisherman's Bastion. So I'm going to go into flights using Google Flights with the airlines and the typical costs.

And I can say that these flight durations and timings are accurate. Let me show you one more cool thing before we get to some of the anti-hype. I asked Bard to search my Google Drive for the recent SmartGPT doc and said write a Shakespearean sonnet to summarize it. And you can almost imagine this happening for the minutes of a meeting you've recently attended or anything that might be a little dry that you want to spice up a little bit.

Now, yes, this sonnet is a bit of a pain in the ass, but it's not a big deal. It's not a big deal. It's a very simple way to get a good idea of what's going on. The sonnet doesn't accurately follow the Shakespearean template, but it's really not bad.

Look at this. From humble prompts, thou dost create such light. Thy creators, AI explained and Josh, with foresight keen, have harnessed thy power to solve the unseen. With chain of thought, research and reflection keen, thou dost answer questions with precision keen. However, we are back to the problem of hallucinations.

If you can't fully rely on what Bard is saying, its use drops massively. Now, I have successfully got Bard to read a lot of my Gmail messages. It can do that. And of course, some of the messages I get are comments on my YouTube channel. But this time when I asked, it simply made up feedback that I hadn't gotten.

Let's be honest, you guys know YouTube. Do any YouTube comments sound like any of these three? It seems very rare to me to get a YouTube comment that is so neutral and polite and calm. Of course, I quickly checked and no, such a comment didn't exist. This was a comment that I had to make.

But I didn't get any of these three. I didn't get any of these three. I was Bard hallucinating. In fact, all three were. And I just want to quickly show you it can sometimes find genuine comments like this one. But the problem is, what is the utility of searching for something like that if you can't rely on it?

Also, sometimes you have to remind it to do things that you know it can do. For example, here, it successfully went through my Gmail and found my Replit bill. But when I asked how much more is my Replit monthly bill than my Eleven Labs bill, it said it can't find anything.

But then when I prompted again and again, it first found the receipt from Eleven Labs and then compared it. But it wasn't smooth and I would say at this point, not terribly useful. Likewise, this image recognition of Darth Vader was great, even without naming the character, it got it.

But the figures for gross revenue were unreliable. And yes, I did try asking to adjust for inflation and it still got it wrong. Now you might say, what about that button where you can check the validity of an answer? And I have tried that out a few times. It's this button that you can press and it will tell you what you want to do.

And it will tell you what you want to do. And then you can press this Google button here that says double-check response. After all, BARD powered by Palm II is supposed to be the only language model to actively admit how confident it is in its response. And while this did work to flag up that the gross revenue figure might be inaccurate, saying you should consider researching further, it doesn't normally flag up too much, to be honest.

Here, there was just a single green line for one answer to this mathematical problem, which it got wrong. This, by the way, is a classic mathematical question that I have tested GPT-4 on many times, and it also normally gets it wrong. But remember that thing about BARD being super honest when it doesn't know an answer.

I would have been really impressed if this was full of orange and red, saying I'm really not sure. And I even directly prompted BARD asking this, how confident are you in this response? And it said, I am very confident in my response. I then pointed out where it was wrong, and it did admit the error.

Anyway, I don't want to be too harsh. I do think these extensions are a significant improvement for BARD. Imagine you're on holiday and you see a castle like this in the distance. All I had to do was take a photo and write YouTube this. It realized what the castle was and gave me YouTube videos for it.

So if I was walking along, I could watch that getting hyped about the castle I'm about to see. It goes without saying that, of course, I have been following the news, but I am still working my way through the science paper. And it does seem like that it's more of a step forward.

I don't know if it's a step forward, but I'm still working my way through the science paper. I'm not sure if it's a step forward than a sea change, as the article in Nature points out. Of course, every step forward is welcome, though, in the fight against things like cystic fibrosis and cancer.

I'll have more interesting bits on this in the future if I can find them. But now we move on to HeyGen Avatar 2.0, which I briefly demoed in the last video, and it shocked a lot of people. I wanted to quickly show you some of its capabilities in other languages.

And I think this kind of technology is not only going to revolutionize movie dubbing, it's also maybe as soon as next year going to make us question all historical footage we see. Let's start with Oppenheimer in Spanish. It was obvious that the world would not be the same. Only a few dared to laugh at the situation.

Some people cried. Most were silent. I remembered the phrase from the sacred Hindu texts, the Bhagavad Gita 1, which says, Ahora me he convertido en la muerte, el destructor de mundos. Supongo que todos pensamos de una forma u otra. And now the famous network scene in German. Du musst sagen, dass ich ein Mensch bin, verdammt nochmal.

Mein Leben ist wertvoll. Also möchte ich, dass du jetzt sofort aufstehst und handelst. Bitte steht alle von euren Stühlen auf. Du sollst jetzt sofort aufstehen. Zum Fenster gehen, es öffnen und deinen Kopf herausstrecken und schreien. Ich bin so wütend und werde das nicht länger hinnehmen. And now how about Andrej Karpathy in Hindi.

Or who will make a humanoid robot on a large scale? This is a good form factor because the world is designed for a humanoid form factor. These things will be able to operate our machines. They will be able to sit in chairs. They will be able to drive cars.

The world is designed for humans where you can invest in it. And we will be able to work with time. Or let's rewrite history and make Kennedy's famous speech in French. Because I have made a promise to you and the almighty God, the same Solomon that our ancestors prescribed nearly a century and a quarter ago.

And two more quick ones before we move on. These are the last two languages it can do. Here's Liam Neeson in Polish. Umiejętności, które sprawiają, że jestem twoim koszmarem. Jeśli uwolnisz moją córkę, to koniec. Nie będę cię szukać. Nie będę cię ścigał. Jednak jeśli nie zdecydujesz się uciekać, będę cię szukać.

Znajdę cię i zabiję, bez względu na to, jak bardzo będziesz się starać ukryć. Mi è venuto in mente qualcosa. Mi sono addormentato in un sonno tranquillo e non pensato più a te. Sai cosa mi è venuto in mente? No. Sei solo un bambino. Non hai idea di quello di cui parli.

Grazie. Va bene. Non sei mai stato fuori da Boston. I should note that this only works for translation. You can't just get someone to say anything, not without a consent form. That's why I don't think it will be as early as next year when a politician in a major election is able to claim that they lost due to deepfake imagery.

I think it will be very close and I could see it happening in 2020. 2025 but not 2024. The guardrails are still quite tight and the technology isn't quite good enough. I'm of course predicting this on the wonderful Metaculous website. Not only are they sponsoring the video but it's actually free to sign up using the link in the description.

As I mentioned in my last video you can see the aggregated forecast of hundreds of AI questions and I think my audience is going to be better informed than average so do sign up and start making your own forecasts. I genuinely want our community to be among the best informed on the planet in AI progress.

Speaking of which I think many of you might be genuinely interested in some of the guests and topics I've got lined up with one of the most famous professors who argues that GPT models can't reason versus the recent news that GPT 3.5 instruct can play chess and you may even hear the result of my game.

against GPT 3.5. But now it's time for this the inventive AI image generation you saw at the start of the video. I'm going to show you not only how you can create it yourself but also how you can turn it into a video. Hopefully you can see the AI explained in this image and maybe you like a bit of movement like this.

Here are some of the other generations if you prefer things to be a little bolder. And can you spot the letters AGI in this somewhat quirky video? I just love how two of the characters just yeet their way out of the video randomly. There's two quick tools I want to show you.

The first one is fusion art where you get 25 free credits. What I've been doing is to create bold text in Adobe Express. I download the output and then click upload black and white image. I've changed it to 16 by 9 and then you can basically get creative with the prompt.

You can see one of the other generations I came up with here. Then what I did is I took those images to Runway Gen 2. You can sign up for free and then you get 45 seconds of generation. You can see here what some other people have been doing.

This is from Stealthy the Time Traveler. This by the way is the free and unlimited hugging face space that you can use alternatively. And let me just give you some tips before I move on. The higher up you make the illusion strength the more you can see the original illusion.

In this case that might be the letters AI explained or in other cases it might be the background shapes or spirals. But one big benefit of reducing the illusion strength is that you get more interesting lifelike images. If you want different outputs you can also change the prompt of course.

Or if you want to keep the prompt the same and get different outputs just click on advanced options and just randomly change the seed and you'll get a different output. If you ever find the resolution a little low there are free websites where you can upscale it. If you want to change the resolution you can also change the font size and upscale images linked in the description.

And this kind of stuff is only going to get crazier when it moves into the world of 3D as you can see here. But if you find testing for dangerous capabilities more compelling than generating AI images how about the OpenAI red teaming network announced less than 24 hours ago.

Essentially if you are a domain expert in any of the 26 areas I'm about to show you, you can join up and get paid. And that's even if you can commit to as little as 5-10 hours in a year. Honestly if I didn't have a bit of a conflict of interest with the channel I would definitely have signed up to this.

Getting paid to get early access to what could be GPT-5 and red team it seems like a sweet deal to me. If you are a subject matter in any of these areas then do check out the link in the description. At the very least it should be highly interesting work.

That's it for today a lot of diverse topics covered and a lot more to come. Do let me know in the comments if you found any of this video helpful. If you found this video helpful please like, share and subscribe. As always have a wonderful day.