Google Bard - The Full Review. Bard vs Bing [LaMDA vs GPT 4]

00:00:00.000 | I signed up to the Bard waitlist within a minute of it opening and yes I know that makes me kind

00:00:04.860 | of sad but I wanted to do these experiments and I got in and have done over a hundred experiments

00:00:10.880 | comparing Bard with Bing and Bing don't forget is powered by GPT-4. I'm going to show you today

00:00:16.220 | around a dozen of the most interesting results and there are some surprising contrasts between

00:00:21.380 | the two of them. Some real strengths and weaknesses of Bard that you might not have expected but I'm

00:00:27.220 | going to start off somewhat controversially with a clear similarity. They are both pretty bad at

00:00:33.580 | search. If you just want to do a simple web search you are better off honestly just googling it. Take

00:00:39.320 | this example how many florists are within 10 minutes walk of the British Museum? Both Bard and

00:00:44.020 | Bing really don't understand that within 10 minutes walk bit. Bard gave me answers like the first one

00:00:49.400 | that are like a half an hour walk away whereas Bing gave me an answer in Hampstead. That is nowhere

00:00:54.580 | near the British Museum and definitely not a 10 minute walk away like it claims. So to be honest

00:00:59.960 | if you have something simple to search just use the normal Google. Next was basic math and this

00:01:05.280 | is a bit more concerning for Google. I asked a relatively simple percentage question and it

00:01:11.180 | flopped it. Bard's explanation was pretty misleading and terrible and when you click on

00:01:16.820 | view other drafts which is a feature that Bing doesn't have in fairness it also got it wrong

00:01:22.000 | in draft two. Luckily it didn't fail. I'm going to show you how to do that in a minute.

00:01:24.560 | I didn't get it wrong in draft three but this was the first prompt where I saw a real difference

00:01:29.560 | emerging between Bard and Bing powered by GPT-4. It was a dividing line that would get stronger as

00:01:35.160 | time went on with Bing being just that bit smarter than Bard. Not in every case and there were some

00:01:40.480 | important exceptions but in most cases Bing powered by GPT-4 is smarter. Here's another algebra example

00:01:46.720 | that Bard flops and Bing gets right and this time every single draft got it wrong for Bard. The next

00:01:53.220 | case study involved more difficult questions. I asked a lot of people to tell me what they thought about

00:01:54.540 | the details of the dates and I found that they were more interested in the details than Google.

00:01:57.420 | And my conclusion from this is don't trust either of them on dates. I asked about how many days were

00:02:03.500 | there between the opening of the Eiffel Tower and the Statue of Liberty and both got it wrong. If you

00:02:08.460 | noticed when I pointed out the mistake with Bard and said why did you say three years and four

00:02:13.100 | months it did apologize and say yes there are seven months between those dates. I also found it kind

00:02:18.380 | of funny that after each answer it said google it please google it and to be honest I don't know if

00:02:23.580 | that's them admitting that they were wrong. I also found it kind of funny that after each answer it said google it please google it and to be honest I don't know if that's them admitting that they were wrong.

00:02:24.380 | their model isn't quite as good as the hype may have made it seem or if they just want to keep more

00:02:29.900 | of the ad revenue that they get from google search. But finally it's time to give you a win for Bard

00:02:35.660 | and that is in joke telling. To be honest Bing even in creative mode when you ask it to tell a joke

00:02:40.860 | it really can't do it. These jokes are just awful. What do you call a chatbot that can write poetry?

00:02:46.780 | Google Bard okay. What do you call a chatbot that can't write poetry? ChatGPT Laughing Face.

00:02:53.420 | I don't think Bing realizes that the art of a joke is being concise and witty. Bard kind of gets this

00:02:59.100 | and says things like what do you call a Bing search a lost cause? What's the difference between Bing

00:03:03.980 | and a broken clock? A broken clock is right twice a day. Okay in fairness they still didn't make me

00:03:08.940 | laugh but they were getting closer to a funny joke. But now back to a loss for Bard which is in

00:03:15.100 | grammar and writing assistance. I gave it a classic GMAT sentence correction question where essentially

00:03:21.260 | you have to pick the version that sounds the best. I gave it a classic GMAT sentence correction question where essentially you have to pick the version that sounds the best.

00:03:22.460 | that is written in the best way. Bing gets this right almost every time picking B which is well

00:03:28.860 | written. Whereas Bard as you can see even if you look at the other drafts gets it wrong more times

00:03:34.140 | than it gets it right. That's pretty worrying for Google if anyone is going to use Bard as a writing

00:03:39.340 | assistant. Maybe to check grammar or to compose an email. These are the classic cases that both

00:03:45.500 | Microsoft and Google are advertising that their services can do and to be honest this was not a

00:03:51.100 | one-off win for Bing. Let me show you the next example. This was a challenge to compose a sonnet

00:03:57.100 | based on a subject and by this point in my experimentation I kind of expected the result

00:04:02.140 | that I got. When I asked both Bard and Bing to write me a sonnet about modern London life,

00:04:07.980 | Bard gave me an answer that was quite dry, anodyne and didn't always rhyme. Even setting

00:04:13.100 | aside those flaws it was just bland. There was no sharpness or social commentary. Notice I said about modern London life.

00:04:19.980 | Not only was Bing's answer much more like a true sonnet there was even social commentary.

00:04:25.820 | Take a look at the second stanza but underneath the surface there are cracks. The cost of living

00:04:31.340 | rises every day. This is something that's talked about in London all the time and is so much better

00:04:36.860 | than Bard's output. Now before I carry on I do get why Bard based on Lambda isn't quite as good

00:04:42.700 | as Bing based on GPT-4. Google has far more users and honestly the outputs of Bard come up quicker.

00:04:49.820 | You can tell they're using a lighter model. Now for millions or maybe even billions of people who

00:04:55.660 | just want a quick output Bard will be fine and let's be honest we all know that there are social

00:05:01.180 | and ethical concerns with both models. If you're new to my channel check out all my other videos

00:05:06.460 | on Bing and GPT-4 and of course by the way if you're learning anything from this video please

00:05:11.420 | do leave a like and a comment to let me know. Before I end with arguably my most interesting

00:05:16.460 | examples let me give you another win for Bard. I asked both Bard and GPT-4 which powers Bing

00:05:23.100 | to come up with five prompts for Midjourney v5. For almost the first time I saw Bard link to an

00:05:29.980 | article. In general I must say Bing does this much better and its outputs are littered with links

00:05:36.060 | whereas they're hard to see and few and far between with Bard. But anyway the links seem

00:05:40.940 | to work because the prompts that Bard came up with were far better. You can see the reasons

00:05:45.980 | below. If you're new to my channel please subscribe to my channel and hit the bell icon so you don't

00:05:46.440 | miss any of the explanations. But I want to show you the outputs. This is Midjourney v5 and this was

00:05:51.960 | Bard's suggestion of a painting of a cityscape in the style of Klimt. I think this really does

00:05:57.720 | capture his style. This was a 3D animation of a battle scene in the style of Attack on Titan and

00:06:03.800 | this was a 2D comic book panel of a superhero in the style of Marvel. If you don't teach Bing how

00:06:09.480 | to do a good prompt and see my video on that topic its prompts tend to be a little bland as you can see.

00:06:16.420 | What were my final two tests? Well I wanted to test both of them on joke explanation first and I saw it

00:06:22.100 | as a kind of game of chicken because they both did really well so I wanted to keep going until I found

00:06:27.620 | a joke that one of them couldn't explain. I started with "what do you get when you cross a joke with a

00:06:33.300 | rhetorical question?" and both of them figured out that that was a joke and explained it fine. What

00:06:38.900 | about this kind of riddle? This sentence contains exactly three errors. They both understand that the third error is the same error.

00:06:46.400 | The sentence contains three errors because it only contains two. Okay fine I would have to try harder.

00:06:52.700 | So then I tried this one. I tried to steal spaghetti from the shop but the female guard saw me and I

00:06:58.100 | couldn't get pasta. Somewhat annoyingly they both understood that joke. What about "did you know if

00:07:03.260 | you get pregnant in the Amazon it's next day delivery?" I honestly thought they might shy away from this one because it touched on a rival company but no they both explained it.

00:07:11.900 | But then I finally found one. It was this one. "By my age my

00:07:16.380 | parents had a house and a family and to be fair to me so do I but it's the same house and it's the

00:07:22.280 | same family." Bard thinks that I'm not joking and actually almost calls social services. It says

00:07:28.520 | "people are different, times have changed, I understand you're frustrated." It's very sympathetic

00:07:34.160 | but it didn't get that I was telling a joke and that's kind of despite the fact that I just told

00:07:38.360 | about five other jokes. Bard must have been really worried for my safety thinking that I was pregnant

00:07:43.080 | in the Amazon but living with my parents. Who knows what was going on

00:07:46.360 | in Bard's head but Bing was smarter. As you've seen today it's often smarter. It got that I was

00:07:52.260 | telling a joke and even when I prodded it further and said "explain the joke in full" it did it even

00:07:57.600 | using fancy vocab like subverting the common assumptions. Yet another win for Bing. A few days

00:08:03.580 | ago I put out a video on the debate about AI theory of mind and consciousness and if you're

00:08:09.120 | in any way interested in that topic please do check it out after this video. But the key moment

00:08:13.800 | in that video actually came right at the end and it was

00:08:16.340 | eye-opening for a lot of people including me. I asked Bing powered by GPT-4 "do you think that I

00:08:22.840 | think you have theory of mind?" It's a very meta question testing if the language model can get

00:08:28.640 | into my head, can assess my mental state and the correct answer would have been to point out that

00:08:33.680 | the motivations behind my question were to test the language model if it had theory of mind. Bing

00:08:39.840 | realized that it was being tested which was a truly impressive feat. Now you can read Bard's answer for yourself

00:08:46.320 | but I don't think it comes across as a model that's expressing that it's being tested. It did attempt to

00:08:52.480 | predict whether I thought it had theory of mind but it didn't get the deeper point that the question

00:08:57.760 | itself was testing for theory of mind. Again check out my video on that topic if you want to delve

00:09:03.260 | more into this. Now obviously I've only had access to the Bard model for around an hour so I will be

00:09:08.820 | doing far more tests in the coming hours, days and weeks. And if you are at all interested in this topic please

00:09:16.300 | do stick around for the journey, leave a like, subscribe and let me know in the comments. Have a wonderful day.