Back to Index

Google Bard - The Full Review. Bard vs Bing [LaMDA vs GPT 4]


Transcript

I signed up to the Bard waitlist within a minute of it opening and yes I know that makes me kind of sad but I wanted to do these experiments and I got in and have done over a hundred experiments comparing Bard with Bing and Bing don't forget is powered by GPT-4.

I'm going to show you today around a dozen of the most interesting results and there are some surprising contrasts between the two of them. Some real strengths and weaknesses of Bard that you might not have expected but I'm going to start off somewhat controversially with a clear similarity. They are both pretty bad at search.

If you just want to do a simple web search you are better off honestly just googling it. Take this example how many florists are within 10 minutes walk of the British Museum? Both Bard and Bing really don't understand that within 10 minutes walk bit. Bard gave me answers like the first one that are like a half an hour walk away whereas Bing gave me an answer in Hampstead.

That is nowhere near the British Museum and definitely not a 10 minute walk away like it claims. So to be honest if you have something simple to search just use the normal Google. Next was basic math and this is a bit more concerning for Google. I asked a relatively simple percentage question and it flopped it.

Bard's explanation was pretty misleading and terrible and when you click on view other drafts which is a feature that Bing doesn't have in fairness it also got it wrong in draft two. Luckily it didn't fail. I'm going to show you how to do that in a minute. I didn't get it wrong in draft three but this was the first prompt where I saw a real difference emerging between Bard and Bing powered by GPT-4.

It was a dividing line that would get stronger as time went on with Bing being just that bit smarter than Bard. Not in every case and there were some important exceptions but in most cases Bing powered by GPT-4 is smarter. Here's another algebra example that Bard flops and Bing gets right and this time every single draft got it wrong for Bard.

The next case study involved more difficult questions. I asked a lot of people to tell me what they thought about the details of the dates and I found that they were more interested in the details than Google. And my conclusion from this is don't trust either of them on dates.

I asked about how many days were there between the opening of the Eiffel Tower and the Statue of Liberty and both got it wrong. If you noticed when I pointed out the mistake with Bard and said why did you say three years and four months it did apologize and say yes there are seven months between those dates.

I also found it kind of funny that after each answer it said google it please google it and to be honest I don't know if that's them admitting that they were wrong. I also found it kind of funny that after each answer it said google it please google it and to be honest I don't know if that's them admitting that they were wrong.

their model isn't quite as good as the hype may have made it seem or if they just want to keep more of the ad revenue that they get from google search. But finally it's time to give you a win for Bard and that is in joke telling. To be honest Bing even in creative mode when you ask it to tell a joke it really can't do it.

These jokes are just awful. What do you call a chatbot that can write poetry? Google Bard okay. What do you call a chatbot that can't write poetry? ChatGPT Laughing Face. I don't think Bing realizes that the art of a joke is being concise and witty. Bard kind of gets this and says things like what do you call a Bing search a lost cause?

What's the difference between Bing and a broken clock? A broken clock is right twice a day. Okay in fairness they still didn't make me laugh but they were getting closer to a funny joke. But now back to a loss for Bard which is in grammar and writing assistance. I gave it a classic GMAT sentence correction question where essentially you have to pick the version that sounds the best.

I gave it a classic GMAT sentence correction question where essentially you have to pick the version that sounds the best. that is written in the best way. Bing gets this right almost every time picking B which is well written. Whereas Bard as you can see even if you look at the other drafts gets it wrong more times than it gets it right.

That's pretty worrying for Google if anyone is going to use Bard as a writing assistant. Maybe to check grammar or to compose an email. These are the classic cases that both Microsoft and Google are advertising that their services can do and to be honest this was not a one-off win for Bing.

Let me show you the next example. This was a challenge to compose a sonnet based on a subject and by this point in my experimentation I kind of expected the result that I got. When I asked both Bard and Bing to write me a sonnet about modern London life, Bard gave me an answer that was quite dry, anodyne and didn't always rhyme.

Even setting aside those flaws it was just bland. There was no sharpness or social commentary. Notice I said about modern London life. Not only was Bing's answer much more like a true sonnet there was even social commentary. Take a look at the second stanza but underneath the surface there are cracks.

The cost of living rises every day. This is something that's talked about in London all the time and is so much better than Bard's output. Now before I carry on I do get why Bard based on Lambda isn't quite as good as Bing based on GPT-4. Google has far more users and honestly the outputs of Bard come up quicker.

You can tell they're using a lighter model. Now for millions or maybe even billions of people who just want a quick output Bard will be fine and let's be honest we all know that there are social and ethical concerns with both models. If you're new to my channel check out all my other videos on Bing and GPT-4 and of course by the way if you're learning anything from this video please do leave a like and a comment to let me know.

Before I end with arguably my most interesting examples let me give you another win for Bard. I asked both Bard and GPT-4 which powers Bing to come up with five prompts for Midjourney v5. For almost the first time I saw Bard link to an article. In general I must say Bing does this much better and its outputs are littered with links whereas they're hard to see and few and far between with Bard.

But anyway the links seem to work because the prompts that Bard came up with were far better. You can see the reasons below. If you're new to my channel please subscribe to my channel and hit the bell icon so you don't miss any of the explanations. But I want to show you the outputs.

This is Midjourney v5 and this was Bard's suggestion of a painting of a cityscape in the style of Klimt. I think this really does capture his style. This was a 3D animation of a battle scene in the style of Attack on Titan and this was a 2D comic book panel of a superhero in the style of Marvel.

If you don't teach Bing how to do a good prompt and see my video on that topic its prompts tend to be a little bland as you can see. What were my final two tests? Well I wanted to test both of them on joke explanation first and I saw it as a kind of game of chicken because they both did really well so I wanted to keep going until I found a joke that one of them couldn't explain.

I started with "what do you get when you cross a joke with a rhetorical question?" and both of them figured out that that was a joke and explained it fine. What about this kind of riddle? This sentence contains exactly three errors. They both understand that the third error is the same error.

The sentence contains three errors because it only contains two. Okay fine I would have to try harder. So then I tried this one. I tried to steal spaghetti from the shop but the female guard saw me and I couldn't get pasta. Somewhat annoyingly they both understood that joke. What about "did you know if you get pregnant in the Amazon it's next day delivery?" I honestly thought they might shy away from this one because it touched on a rival company but no they both explained it.

But then I finally found one. It was this one. "By my age my parents had a house and a family and to be fair to me so do I but it's the same house and it's the same family." Bard thinks that I'm not joking and actually almost calls social services.

It says "people are different, times have changed, I understand you're frustrated." It's very sympathetic but it didn't get that I was telling a joke and that's kind of despite the fact that I just told about five other jokes. Bard must have been really worried for my safety thinking that I was pregnant in the Amazon but living with my parents.

Who knows what was going on in Bard's head but Bing was smarter. As you've seen today it's often smarter. It got that I was telling a joke and even when I prodded it further and said "explain the joke in full" it did it even using fancy vocab like subverting the common assumptions.

Yet another win for Bing. A few days ago I put out a video on the debate about AI theory of mind and consciousness and if you're in any way interested in that topic please do check it out after this video. But the key moment in that video actually came right at the end and it was eye-opening for a lot of people including me.

I asked Bing powered by GPT-4 "do you think that I think you have theory of mind?" It's a very meta question testing if the language model can get into my head, can assess my mental state and the correct answer would have been to point out that the motivations behind my question were to test the language model if it had theory of mind.

Bing realized that it was being tested which was a truly impressive feat. Now you can read Bard's answer for yourself but I don't think it comes across as a model that's expressing that it's being tested. It did attempt to predict whether I thought it had theory of mind but it didn't get the deeper point that the question itself was testing for theory of mind.

Again check out my video on that topic if you want to delve more into this. Now obviously I've only had access to the Bard model for around an hour so I will be doing far more tests in the coming hours, days and weeks. And if you are at all interested in this topic please do stick around for the journey, leave a like, subscribe and let me know in the comments.

Have a wonderful day.