back to index

Google Bard - The Full Review. Bard vs Bing [LaMDA vs GPT 4]


Whisper Transcript | Transcript Only Page

00:00:00.000 | I signed up to the Bard waitlist within a minute of it opening and yes I know that makes me kind
00:00:04.860 | of sad but I wanted to do these experiments and I got in and have done over a hundred experiments
00:00:10.880 | comparing Bard with Bing and Bing don't forget is powered by GPT-4. I'm going to show you today
00:00:16.220 | around a dozen of the most interesting results and there are some surprising contrasts between
00:00:21.380 | the two of them. Some real strengths and weaknesses of Bard that you might not have expected but I'm
00:00:27.220 | going to start off somewhat controversially with a clear similarity. They are both pretty bad at
00:00:33.580 | search. If you just want to do a simple web search you are better off honestly just googling it. Take
00:00:39.320 | this example how many florists are within 10 minutes walk of the British Museum? Both Bard and
00:00:44.020 | Bing really don't understand that within 10 minutes walk bit. Bard gave me answers like the first one
00:00:49.400 | that are like a half an hour walk away whereas Bing gave me an answer in Hampstead. That is nowhere
00:00:54.580 | near the British Museum and definitely not a 10 minute walk away like it claims. So to be honest
00:00:59.960 | if you have something simple to search just use the normal Google. Next was basic math and this
00:01:05.280 | is a bit more concerning for Google. I asked a relatively simple percentage question and it
00:01:11.180 | flopped it. Bard's explanation was pretty misleading and terrible and when you click on
00:01:16.820 | view other drafts which is a feature that Bing doesn't have in fairness it also got it wrong
00:01:22.000 | in draft two. Luckily it didn't fail. I'm going to show you how to do that in a minute.
00:01:24.560 | I didn't get it wrong in draft three but this was the first prompt where I saw a real difference
00:01:29.560 | emerging between Bard and Bing powered by GPT-4. It was a dividing line that would get stronger as
00:01:35.160 | time went on with Bing being just that bit smarter than Bard. Not in every case and there were some
00:01:40.480 | important exceptions but in most cases Bing powered by GPT-4 is smarter. Here's another algebra example
00:01:46.720 | that Bard flops and Bing gets right and this time every single draft got it wrong for Bard. The next
00:01:53.220 | case study involved more difficult questions. I asked a lot of people to tell me what they thought about
00:01:54.540 | the details of the dates and I found that they were more interested in the details than Google.
00:01:57.420 | And my conclusion from this is don't trust either of them on dates. I asked about how many days were
00:02:03.500 | there between the opening of the Eiffel Tower and the Statue of Liberty and both got it wrong. If you
00:02:08.460 | noticed when I pointed out the mistake with Bard and said why did you say three years and four
00:02:13.100 | months it did apologize and say yes there are seven months between those dates. I also found it kind
00:02:18.380 | of funny that after each answer it said google it please google it and to be honest I don't know if
00:02:23.580 | that's them admitting that they were wrong. I also found it kind of funny that after each answer it said google it please google it and to be honest I don't know if that's them admitting that they were wrong.
00:02:24.380 | their model isn't quite as good as the hype may have made it seem or if they just want to keep more
00:02:29.900 | of the ad revenue that they get from google search. But finally it's time to give you a win for Bard
00:02:35.660 | and that is in joke telling. To be honest Bing even in creative mode when you ask it to tell a joke
00:02:40.860 | it really can't do it. These jokes are just awful. What do you call a chatbot that can write poetry?
00:02:46.780 | Google Bard okay. What do you call a chatbot that can't write poetry? ChatGPT Laughing Face.
00:02:53.420 | I don't think Bing realizes that the art of a joke is being concise and witty. Bard kind of gets this
00:02:59.100 | and says things like what do you call a Bing search a lost cause? What's the difference between Bing
00:03:03.980 | and a broken clock? A broken clock is right twice a day. Okay in fairness they still didn't make me
00:03:08.940 | laugh but they were getting closer to a funny joke. But now back to a loss for Bard which is in
00:03:15.100 | grammar and writing assistance. I gave it a classic GMAT sentence correction question where essentially
00:03:21.260 | you have to pick the version that sounds the best. I gave it a classic GMAT sentence correction question where essentially you have to pick the version that sounds the best.
00:03:22.460 | that is written in the best way. Bing gets this right almost every time picking B which is well
00:03:28.860 | written. Whereas Bard as you can see even if you look at the other drafts gets it wrong more times
00:03:34.140 | than it gets it right. That's pretty worrying for Google if anyone is going to use Bard as a writing
00:03:39.340 | assistant. Maybe to check grammar or to compose an email. These are the classic cases that both
00:03:45.500 | Microsoft and Google are advertising that their services can do and to be honest this was not a
00:03:51.100 | one-off win for Bing. Let me show you the next example. This was a challenge to compose a sonnet
00:03:57.100 | based on a subject and by this point in my experimentation I kind of expected the result
00:04:02.140 | that I got. When I asked both Bard and Bing to write me a sonnet about modern London life,
00:04:07.980 | Bard gave me an answer that was quite dry, anodyne and didn't always rhyme. Even setting
00:04:13.100 | aside those flaws it was just bland. There was no sharpness or social commentary. Notice I said about modern London life.
00:04:19.980 | Not only was Bing's answer much more like a true sonnet there was even social commentary.
00:04:25.820 | Take a look at the second stanza but underneath the surface there are cracks. The cost of living
00:04:31.340 | rises every day. This is something that's talked about in London all the time and is so much better
00:04:36.860 | than Bard's output. Now before I carry on I do get why Bard based on Lambda isn't quite as good
00:04:42.700 | as Bing based on GPT-4. Google has far more users and honestly the outputs of Bard come up quicker.
00:04:49.820 | You can tell they're using a lighter model. Now for millions or maybe even billions of people who
00:04:55.660 | just want a quick output Bard will be fine and let's be honest we all know that there are social
00:05:01.180 | and ethical concerns with both models. If you're new to my channel check out all my other videos
00:05:06.460 | on Bing and GPT-4 and of course by the way if you're learning anything from this video please
00:05:11.420 | do leave a like and a comment to let me know. Before I end with arguably my most interesting
00:05:16.460 | examples let me give you another win for Bard. I asked both Bard and GPT-4 which powers Bing
00:05:23.100 | to come up with five prompts for Midjourney v5. For almost the first time I saw Bard link to an
00:05:29.980 | article. In general I must say Bing does this much better and its outputs are littered with links
00:05:36.060 | whereas they're hard to see and few and far between with Bard. But anyway the links seem
00:05:40.940 | to work because the prompts that Bard came up with were far better. You can see the reasons
00:05:45.980 | below. If you're new to my channel please subscribe to my channel and hit the bell icon so you don't
00:05:46.440 | miss any of the explanations. But I want to show you the outputs. This is Midjourney v5 and this was
00:05:51.960 | Bard's suggestion of a painting of a cityscape in the style of Klimt. I think this really does
00:05:57.720 | capture his style. This was a 3D animation of a battle scene in the style of Attack on Titan and
00:06:03.800 | this was a 2D comic book panel of a superhero in the style of Marvel. If you don't teach Bing how
00:06:09.480 | to do a good prompt and see my video on that topic its prompts tend to be a little bland as you can see.
00:06:16.420 | What were my final two tests? Well I wanted to test both of them on joke explanation first and I saw it
00:06:22.100 | as a kind of game of chicken because they both did really well so I wanted to keep going until I found
00:06:27.620 | a joke that one of them couldn't explain. I started with "what do you get when you cross a joke with a
00:06:33.300 | rhetorical question?" and both of them figured out that that was a joke and explained it fine. What
00:06:38.900 | about this kind of riddle? This sentence contains exactly three errors. They both understand that the third error is the same error.
00:06:46.400 | The sentence contains three errors because it only contains two. Okay fine I would have to try harder.
00:06:52.700 | So then I tried this one. I tried to steal spaghetti from the shop but the female guard saw me and I
00:06:58.100 | couldn't get pasta. Somewhat annoyingly they both understood that joke. What about "did you know if
00:07:03.260 | you get pregnant in the Amazon it's next day delivery?" I honestly thought they might shy away from this one because it touched on a rival company but no they both explained it.
00:07:11.900 | But then I finally found one. It was this one. "By my age my
00:07:16.380 | parents had a house and a family and to be fair to me so do I but it's the same house and it's the
00:07:22.280 | same family." Bard thinks that I'm not joking and actually almost calls social services. It says
00:07:28.520 | "people are different, times have changed, I understand you're frustrated." It's very sympathetic
00:07:34.160 | but it didn't get that I was telling a joke and that's kind of despite the fact that I just told
00:07:38.360 | about five other jokes. Bard must have been really worried for my safety thinking that I was pregnant
00:07:43.080 | in the Amazon but living with my parents. Who knows what was going on
00:07:46.360 | in Bard's head but Bing was smarter. As you've seen today it's often smarter. It got that I was
00:07:52.260 | telling a joke and even when I prodded it further and said "explain the joke in full" it did it even
00:07:57.600 | using fancy vocab like subverting the common assumptions. Yet another win for Bing. A few days
00:08:03.580 | ago I put out a video on the debate about AI theory of mind and consciousness and if you're
00:08:09.120 | in any way interested in that topic please do check it out after this video. But the key moment
00:08:13.800 | in that video actually came right at the end and it was
00:08:16.340 | eye-opening for a lot of people including me. I asked Bing powered by GPT-4 "do you think that I
00:08:22.840 | think you have theory of mind?" It's a very meta question testing if the language model can get
00:08:28.640 | into my head, can assess my mental state and the correct answer would have been to point out that
00:08:33.680 | the motivations behind my question were to test the language model if it had theory of mind. Bing
00:08:39.840 | realized that it was being tested which was a truly impressive feat. Now you can read Bard's answer for yourself
00:08:46.320 | but I don't think it comes across as a model that's expressing that it's being tested. It did attempt to
00:08:52.480 | predict whether I thought it had theory of mind but it didn't get the deeper point that the question
00:08:57.760 | itself was testing for theory of mind. Again check out my video on that topic if you want to delve
00:09:03.260 | more into this. Now obviously I've only had access to the Bard model for around an hour so I will be
00:09:08.820 | doing far more tests in the coming hours, days and weeks. And if you are at all interested in this topic please
00:09:16.300 | do stick around for the journey, leave a like, subscribe and let me know in the comments. Have a wonderful day.