back to indexGoogle Bard - The Full Review. Bard vs Bing [LaMDA vs GPT 4]
00:00:00.000 |
I signed up to the Bard waitlist within a minute of it opening and yes I know that makes me kind 00:00:04.860 |
of sad but I wanted to do these experiments and I got in and have done over a hundred experiments 00:00:10.880 |
comparing Bard with Bing and Bing don't forget is powered by GPT-4. I'm going to show you today 00:00:16.220 |
around a dozen of the most interesting results and there are some surprising contrasts between 00:00:21.380 |
the two of them. Some real strengths and weaknesses of Bard that you might not have expected but I'm 00:00:27.220 |
going to start off somewhat controversially with a clear similarity. They are both pretty bad at 00:00:33.580 |
search. If you just want to do a simple web search you are better off honestly just googling it. Take 00:00:39.320 |
this example how many florists are within 10 minutes walk of the British Museum? Both Bard and 00:00:44.020 |
Bing really don't understand that within 10 minutes walk bit. Bard gave me answers like the first one 00:00:49.400 |
that are like a half an hour walk away whereas Bing gave me an answer in Hampstead. That is nowhere 00:00:54.580 |
near the British Museum and definitely not a 10 minute walk away like it claims. So to be honest 00:00:59.960 |
if you have something simple to search just use the normal Google. Next was basic math and this 00:01:05.280 |
is a bit more concerning for Google. I asked a relatively simple percentage question and it 00:01:11.180 |
flopped it. Bard's explanation was pretty misleading and terrible and when you click on 00:01:16.820 |
view other drafts which is a feature that Bing doesn't have in fairness it also got it wrong 00:01:22.000 |
in draft two. Luckily it didn't fail. I'm going to show you how to do that in a minute. 00:01:24.560 |
I didn't get it wrong in draft three but this was the first prompt where I saw a real difference 00:01:29.560 |
emerging between Bard and Bing powered by GPT-4. It was a dividing line that would get stronger as 00:01:35.160 |
time went on with Bing being just that bit smarter than Bard. Not in every case and there were some 00:01:40.480 |
important exceptions but in most cases Bing powered by GPT-4 is smarter. Here's another algebra example 00:01:46.720 |
that Bard flops and Bing gets right and this time every single draft got it wrong for Bard. The next 00:01:53.220 |
case study involved more difficult questions. I asked a lot of people to tell me what they thought about 00:01:54.540 |
the details of the dates and I found that they were more interested in the details than Google. 00:01:57.420 |
And my conclusion from this is don't trust either of them on dates. I asked about how many days were 00:02:03.500 |
there between the opening of the Eiffel Tower and the Statue of Liberty and both got it wrong. If you 00:02:08.460 |
noticed when I pointed out the mistake with Bard and said why did you say three years and four 00:02:13.100 |
months it did apologize and say yes there are seven months between those dates. I also found it kind 00:02:18.380 |
of funny that after each answer it said google it please google it and to be honest I don't know if 00:02:23.580 |
that's them admitting that they were wrong. I also found it kind of funny that after each answer it said google it please google it and to be honest I don't know if that's them admitting that they were wrong. 00:02:24.380 |
their model isn't quite as good as the hype may have made it seem or if they just want to keep more 00:02:29.900 |
of the ad revenue that they get from google search. But finally it's time to give you a win for Bard 00:02:35.660 |
and that is in joke telling. To be honest Bing even in creative mode when you ask it to tell a joke 00:02:40.860 |
it really can't do it. These jokes are just awful. What do you call a chatbot that can write poetry? 00:02:46.780 |
Google Bard okay. What do you call a chatbot that can't write poetry? ChatGPT Laughing Face. 00:02:53.420 |
I don't think Bing realizes that the art of a joke is being concise and witty. Bard kind of gets this 00:02:59.100 |
and says things like what do you call a Bing search a lost cause? What's the difference between Bing 00:03:03.980 |
and a broken clock? A broken clock is right twice a day. Okay in fairness they still didn't make me 00:03:08.940 |
laugh but they were getting closer to a funny joke. But now back to a loss for Bard which is in 00:03:15.100 |
grammar and writing assistance. I gave it a classic GMAT sentence correction question where essentially 00:03:21.260 |
you have to pick the version that sounds the best. I gave it a classic GMAT sentence correction question where essentially you have to pick the version that sounds the best. 00:03:22.460 |
that is written in the best way. Bing gets this right almost every time picking B which is well 00:03:28.860 |
written. Whereas Bard as you can see even if you look at the other drafts gets it wrong more times 00:03:34.140 |
than it gets it right. That's pretty worrying for Google if anyone is going to use Bard as a writing 00:03:39.340 |
assistant. Maybe to check grammar or to compose an email. These are the classic cases that both 00:03:45.500 |
Microsoft and Google are advertising that their services can do and to be honest this was not a 00:03:51.100 |
one-off win for Bing. Let me show you the next example. This was a challenge to compose a sonnet 00:03:57.100 |
based on a subject and by this point in my experimentation I kind of expected the result 00:04:02.140 |
that I got. When I asked both Bard and Bing to write me a sonnet about modern London life, 00:04:07.980 |
Bard gave me an answer that was quite dry, anodyne and didn't always rhyme. Even setting 00:04:13.100 |
aside those flaws it was just bland. There was no sharpness or social commentary. Notice I said about modern London life. 00:04:19.980 |
Not only was Bing's answer much more like a true sonnet there was even social commentary. 00:04:25.820 |
Take a look at the second stanza but underneath the surface there are cracks. The cost of living 00:04:31.340 |
rises every day. This is something that's talked about in London all the time and is so much better 00:04:36.860 |
than Bard's output. Now before I carry on I do get why Bard based on Lambda isn't quite as good 00:04:42.700 |
as Bing based on GPT-4. Google has far more users and honestly the outputs of Bard come up quicker. 00:04:49.820 |
You can tell they're using a lighter model. Now for millions or maybe even billions of people who 00:04:55.660 |
just want a quick output Bard will be fine and let's be honest we all know that there are social 00:05:01.180 |
and ethical concerns with both models. If you're new to my channel check out all my other videos 00:05:06.460 |
on Bing and GPT-4 and of course by the way if you're learning anything from this video please 00:05:11.420 |
do leave a like and a comment to let me know. Before I end with arguably my most interesting 00:05:16.460 |
examples let me give you another win for Bard. I asked both Bard and GPT-4 which powers Bing 00:05:23.100 |
to come up with five prompts for Midjourney v5. For almost the first time I saw Bard link to an 00:05:29.980 |
article. In general I must say Bing does this much better and its outputs are littered with links 00:05:36.060 |
whereas they're hard to see and few and far between with Bard. But anyway the links seem 00:05:40.940 |
to work because the prompts that Bard came up with were far better. You can see the reasons 00:05:45.980 |
below. If you're new to my channel please subscribe to my channel and hit the bell icon so you don't 00:05:46.440 |
miss any of the explanations. But I want to show you the outputs. This is Midjourney v5 and this was 00:05:51.960 |
Bard's suggestion of a painting of a cityscape in the style of Klimt. I think this really does 00:05:57.720 |
capture his style. This was a 3D animation of a battle scene in the style of Attack on Titan and 00:06:03.800 |
this was a 2D comic book panel of a superhero in the style of Marvel. If you don't teach Bing how 00:06:09.480 |
to do a good prompt and see my video on that topic its prompts tend to be a little bland as you can see. 00:06:16.420 |
What were my final two tests? Well I wanted to test both of them on joke explanation first and I saw it 00:06:22.100 |
as a kind of game of chicken because they both did really well so I wanted to keep going until I found 00:06:27.620 |
a joke that one of them couldn't explain. I started with "what do you get when you cross a joke with a 00:06:33.300 |
rhetorical question?" and both of them figured out that that was a joke and explained it fine. What 00:06:38.900 |
about this kind of riddle? This sentence contains exactly three errors. They both understand that the third error is the same error. 00:06:46.400 |
The sentence contains three errors because it only contains two. Okay fine I would have to try harder. 00:06:52.700 |
So then I tried this one. I tried to steal spaghetti from the shop but the female guard saw me and I 00:06:58.100 |
couldn't get pasta. Somewhat annoyingly they both understood that joke. What about "did you know if 00:07:03.260 |
you get pregnant in the Amazon it's next day delivery?" I honestly thought they might shy away from this one because it touched on a rival company but no they both explained it. 00:07:11.900 |
But then I finally found one. It was this one. "By my age my 00:07:16.380 |
parents had a house and a family and to be fair to me so do I but it's the same house and it's the 00:07:22.280 |
same family." Bard thinks that I'm not joking and actually almost calls social services. It says 00:07:28.520 |
"people are different, times have changed, I understand you're frustrated." It's very sympathetic 00:07:34.160 |
but it didn't get that I was telling a joke and that's kind of despite the fact that I just told 00:07:38.360 |
about five other jokes. Bard must have been really worried for my safety thinking that I was pregnant 00:07:43.080 |
in the Amazon but living with my parents. Who knows what was going on 00:07:46.360 |
in Bard's head but Bing was smarter. As you've seen today it's often smarter. It got that I was 00:07:52.260 |
telling a joke and even when I prodded it further and said "explain the joke in full" it did it even 00:07:57.600 |
using fancy vocab like subverting the common assumptions. Yet another win for Bing. A few days 00:08:03.580 |
ago I put out a video on the debate about AI theory of mind and consciousness and if you're 00:08:09.120 |
in any way interested in that topic please do check it out after this video. But the key moment 00:08:13.800 |
in that video actually came right at the end and it was 00:08:16.340 |
eye-opening for a lot of people including me. I asked Bing powered by GPT-4 "do you think that I 00:08:22.840 |
think you have theory of mind?" It's a very meta question testing if the language model can get 00:08:28.640 |
into my head, can assess my mental state and the correct answer would have been to point out that 00:08:33.680 |
the motivations behind my question were to test the language model if it had theory of mind. Bing 00:08:39.840 |
realized that it was being tested which was a truly impressive feat. Now you can read Bard's answer for yourself 00:08:46.320 |
but I don't think it comes across as a model that's expressing that it's being tested. It did attempt to 00:08:52.480 |
predict whether I thought it had theory of mind but it didn't get the deeper point that the question 00:08:57.760 |
itself was testing for theory of mind. Again check out my video on that topic if you want to delve 00:09:03.260 |
more into this. Now obviously I've only had access to the Bard model for around an hour so I will be 00:09:08.820 |
doing far more tests in the coming hours, days and weeks. And if you are at all interested in this topic please 00:09:16.300 |
do stick around for the journey, leave a like, subscribe and let me know in the comments. Have a wonderful day.