back to index

Manus AI - The Calm Before the Hypestorm … (vs Deep Research + Grok 3)


Chapters

0:0 Introduction
0:46 Hype Campaign
2:40 Single, Public Benchmark
3:12 What is Manus AI?
4:22 Test 1
5:12 Cost and Rate Limits
6:15 Test 2 vs Deep Research + Grok 3 DeepSearch
8:24 Test 3 (not AGI)
11:10 4 Trends in AI in 2025
11:37 Hype Works

Whisper Transcript | Transcript Only Page

00:00:00.000 | By my count there were around 12 underrated developments in AI over the last few days but
00:00:05.320 | they may have to stay underrated for just a moment longer because the 13th Manus AI has millions of
00:00:12.580 | people on its waitlist and maybe a hundred million people talking about it. So for just a few hours
00:00:18.860 | Gemini's new image editing feature, the rise in reward hacking, OpenAI's secret creative writing
00:00:26.980 | model and just about everything else will have to wait. To be honest Manus says a lot about the AI
00:00:33.040 | landscape for both good and ill so let's get started. Before I show you my tests and comparisons
00:00:40.080 | to Deep Research, Grok3 Deep Search and Gemini's Deep Research I want you to think to yourself about
00:00:47.920 | what the perfect hype campaign would be. For me it would start like this, call your product a glimpse
00:00:54.260 | into potential AGI. What other AI stops at generating ideas, Manus delivers results. We see it as the
00:01:00.700 | next paradigm of human-machine collaboration and potentially a glimpse into AGI. Then put your
00:01:05.600 | product behind a waitlist but give early access prioritizing those who you think might hype your
00:01:12.320 | product. A perfect outcome would be comments like this one that China reached AGI with Manus before we
00:01:19.380 | saw GPT-5. Next, create enough scarcity that invite codes are being offered as competition prizes on
00:01:25.820 | Twitter and even better if you can create stories that they are being resold for thousands of dollars
00:01:31.200 | online. If that was true I could very quickly mint it because I was given multiple invite codes.
00:01:35.780 | Of course it's also definitely wise to delay mentioning that your model is based on Claude because
00:01:42.280 | that would slightly dampen down hype and also explains why your website is blocked in China because that
00:01:48.560 | model is blocked in China. Oh and as a cherry on top mention that you have benchmarks plural but then only
00:01:54.800 | provide a single benchmark and of course while you're at it make bloody sure that that is a benchmark that you
00:02:01.040 | outperform your rivals on. Now of course with that long rant I am being awfully cheeky because Manus is actually
00:02:07.720 | fairly cool. The team behind Manus have been kind to me so it's almost a little bit rude for me to do that rant.
00:02:14.660 | But I just couldn't resist giving you a quick insight into how the hype machine works. I'm sure that every single
00:02:20.860 | industry has its own hype merchants. It's just that in a field as important to the future as AI, I don't want you guys or anyone to fall into the extremes.
00:02:29.760 | The extremes being that every new development is just one click away from automating everyone's job
00:02:34.800 | and the other extreme being that AI is just all hype and there's just literally no substance there at all.
00:02:40.780 | Just quickly though while we're on this particular Gaia benchmark which was written in part by Jan LeCun,
00:02:47.040 | famously a skeptic of LLMs, I and perhaps you are quite impressed by these scores compared to OpenAI Deep Research.
00:02:54.680 | The problem is it is a public benchmark so we are slightly reliant on trust that they didn't
00:02:59.740 | over optimize to game essentially the benchmark. For example, because the answers are public, you could keep training your model and optimizing it until it reached a certain bar on this particular benchmark.
00:03:10.760 | For the few of you who haven't actually heard of Manus AI, what is it and what does it do?
00:03:15.780 | Well, it's a bit like a combination of OpenAI's Operator and Deep Research systems. Think of Operator as an agent that can take actions on your computer, it can click things and indeed book restaurants, do that kind of thing.
00:03:27.780 | Operator can be fairly cumbersome though and require a lot of babysitting where it'll do one thing at a time and then ask you another question. In contrast, Deep Research, after clarifying your query, will go off and search, in this case, 37 sources in 14 minutes and produce a comprehensive output.
00:03:44.780 | Deep Research in particular, I think is great, but the reason that Operator and Deep Research aren't particularly popular or terribly widespread is that until very recently, they were limited to the $200 tier called the pro tier from OpenAI.
00:03:59.780 | The final bit of context comes from Claude Projects, which you may or may not be familiar with, which you can think of as interactive previews of the work you've done with Claude.
00:04:09.780 | Put all of that together and ignore every single bug and you have Manus AI.
00:04:14.780 | I asked, for example, create a simple website densely packed with text about what happened with LLMs in March 2025, lit up only by the cursor, which is acting as a flaming torch.
00:04:25.780 | Notice the model would have to do a fairly comprehensive web search a la Deep Research to find out everything that happened with LLMs in March 2025, but then also do a Claude project to make an interactive website.
00:04:38.780 | And then it would have to come up with this little window. So like OpenAI's operator, you could see it taking actions in real time. In fact, you can stop it and guide it. And when it all works, it's pretty impressive.
00:04:51.780 | Here is that website and I had to do pretty much nothing to get it working like this.
00:04:57.780 | And that is where Manus is great, tying together all of these disparate capabilities, albeit not being incredible at each of them, but tying them together into this one agent.
00:05:07.780 | But let's get a few things out of the way to dispel some false notions.
00:05:11.780 | Manus uses dozens of tools and several models, but the key model behind it as of today, according to a person with direct knowledge of the situation, according to the information, is Claude 3.7 Sonnet.
00:05:22.780 | And that model is fairly expensive and incredibly rate limited, which has led to one estimate reported in the MIT Technology Review of a per task cost of about $2 for Manus AI.
00:05:35.780 | Before we rush then and call this China's second DeepSeek moment, we've got to realize a couple of crucial disanalogies with DeepSeek.
00:05:41.780 | DeepSeek made their own model. Manus AI is a compilation of other people's models.
00:05:46.780 | And second, for me, what made the DeepSeek moment was how cheap it was and how widespread the availability was, hence it rocketing up the app charts.
00:05:54.780 | At $2 per task, if you just did, say, five tasks per day, you can just calculate what the cost would be over a month.
00:06:02.780 | And yes, this is probably why even YouTubers like me, who got early access, hit maximum daily usage limits pretty quickly.
00:06:08.780 | Tell me what you think, but no, I don't see Manus AI as the second DeepSeek moment.
00:06:13.780 | What about quality, though?
00:06:15.780 | Well, in my opinion, it does everything really quite well, but nothing at state of the art level.
00:06:21.780 | Let me give you a few examples and try to bring in comparisons with DeepSeek from OpenAI, Grok3 DeepSeek, and even Gemini's DeepSeek.
00:06:31.780 | Yes, the names are confusing, but Manus AI is multimodal in the sense you can upload an image and get it to perform a task based on that image.
00:06:40.780 | For example, I said list the founders of each company featured in this image.
00:06:44.780 | So it would have to recognize the names of the companies and then, of course, perform a kind of deep research.
00:06:49.780 | I tried this with all four of the tools I just mentioned, and I'm going to give you the results in order of speed.
00:06:55.780 | Fastest was deep research from Gemini Advanced, which told me that adding files is not available yet.
00:07:01.780 | That was done in less than one second, which is quite impressive.
00:07:04.780 | Next was Grok3 DeepSeek, which took only two and a half minutes and analyzed 344 sources.
00:07:11.780 | It was fast, but skipped plenty of the companies.
00:07:15.780 | The results were generally accurate, but as you can see here, it said unknown founders for a bunch of these.
00:07:22.780 | Whereas I found it relatively easy to find the founders for quite a few of these companies.
00:07:27.780 | Manus AI and OpenAI's DeepResearch took roughly around the same amount of time, about 15 minutes.
00:07:33.780 | But the performance of Manus AI was noticeably worse.
00:07:37.780 | What I mean by that is if you scroll down, there were two companies where it gave up trying to find the founder.
00:07:43.780 | That's Curated AI and Sezion.
00:07:45.780 | DeepResearch finds the founders of those companies and I double checked and they seem to be fairly reliable to me.
00:07:51.780 | For Sezion, they're reported on the website given as well as Crunchbase.
00:07:56.780 | And for Curated AI in a fairly popular magazine.
00:07:59.780 | I'm not the only one, of course, to report accuracy issues.
00:08:02.780 | There are plenty of reports online like this one, where Manus AI was asked about the gaming console market and ignored Nintendo Switch.
00:08:10.780 | It would be far easier for me to just hyperventilate about all the things it can create and the research it can do if I don't check the accuracy.
00:08:18.780 | Let me try to give you another example.
00:08:20.780 | And trust me, I want to be as fair as possible to Manus.
00:08:23.780 | This time, I gave all four of those agent tools that I mentioned a kind of meta task to research about each other.
00:08:29.780 | Create a table comparing deep research from OpenAI, Manus AI, deep research from Google, deep search from Grok3 with rows for at least 10 features, price, speed and 20 clickable sources.
00:08:42.780 | On this one, arguably, Manus did as well as deep research.
00:08:47.780 | It took far, far longer, around 20 minutes, but I don't know if that's because of the rate limits that are going on at the moment because so many people are using it or if that's just an inbuilt limitation.
00:08:57.780 | This is what Manus came up with.
00:09:00.780 | And if you can't see Grok3, that's because the UI of Manus is a lot more jank compared to the others.
00:09:05.780 | The output is solid, if not entirely reliable.
00:09:09.780 | Let me give you two examples.
00:09:10.780 | First, on the cost per query, it said it cannot be calculated for itself, Manus AI.
00:09:16.780 | Now, while it's true that pricing is not public, as I showed you earlier in this video, there are some public estimates about the price, $2 per query.
00:09:25.780 | Okay, you might say that one's debatable, but how about this one?
00:09:28.780 | Towards the end, it said, what about performance metrics for Manus?
00:09:31.780 | No published benchmark results.
00:09:33.780 | Its own website gives the Gaia benchmark.
00:09:36.780 | On the column right next to it, it quotes the Gaia benchmark for OpenAI's deep research.
00:09:41.780 | So if it can't be fully relied upon to quote its own benchmark results, then it does make you wonder.
00:09:48.780 | Did it fully deserve the absolute cacophony of hype that it got in recent days?
00:09:54.780 | Let's be fair though.
00:09:56.780 | It did give me 20 clickable sources for me to conduct further investigation, which is what I asked for.
00:10:02.780 | Deep research from OpenAI only gave me 15 sources.
00:10:06.780 | And worse than that, it didn't give me a table.
00:10:09.780 | It gave me bullet points and that was pretty disappointing.
00:10:12.780 | It too didn't cite Gaia, although it did link to an article which mentioned Gaia.
00:10:17.780 | So that's kind of slightly better for deep research.
00:10:20.780 | Grok3 did a commendable job, but you could tell it was drawing on less compute because the table at the end felt a bit rushed.
00:10:27.780 | Its own research had confirmed that deep research was available for 10 queries per month to the plus tier, but this wasn't covered in the table.
00:10:36.780 | Again, its research had highlighted certain benchmark scores, but these weren't mentioned in the table.
00:10:42.780 | Gemini's deep research did a decent job.
00:10:44.780 | It was very quick, but the table was fairly scant on detail.
00:10:48.780 | This is definitely not to say that Google DeepMind aren't cooking though, and there are plenty of developments from that company that I want to cover very soon on the channel.
00:10:56.780 | Now, like Operator, Manus can do things like actually book things for you, but according to all reports and my own experience, I would really caution against necessarily trusting it that much to do so.
00:11:09.780 | Manus AI for me is very much then fitting into the pattern that I've seen for four trends that have emerged in 2025.
00:11:17.780 | I did a video on this five days ago for my Patreon and the trends are that models are getting pricier.
00:11:23.780 | They are very patchy in performance.
00:11:25.780 | They have epic moments, but can be internally deceptive.
00:11:29.780 | I will have much more to say about reward hacking on this channel very soon.
00:11:33.780 | Here then for me is the lesson of this video, which is this kind of marketing actually works.
00:11:38.780 | Two million people have signed up to the waitlist and there is something there, right?
00:11:42.780 | As you've seen in this video, Manus AI is a great compilation of different tools and models and often gets the job done.
00:11:49.780 | But because this "sophisticated press push" reported in an interview with Xiao Hong, the founder of Manus AI, was so successful, expect more of it.
00:12:00.780 | Expect an endless list of new companies pushing these YouTube campaigns and tweet storms.
00:12:06.780 | Hype works.
00:12:07.780 | There's just no other way around it.
00:12:09.780 | But if Manus AI comes out at being, say, $200 a month, I personally would give it a miss.
00:12:15.780 | Thank you so much for watching.
00:12:17.780 | But if you have any interest in jailbreaking, what I wouldn't give a miss is the Gray Swan challenges recently announced on their website.
00:12:25.780 | The links will be in the description and they are sponsoring this video.
00:12:28.780 | And in particular, look at this competition started just five days ago.
00:12:32.780 | You can see the leaderboard here and know you don't have to be a professional hacker to get involved.
00:12:38.780 | The prize pool is over $130,000.
00:12:41.780 | From my perspective, you're helping the models become more reliable by red teaming them in public.
00:12:46.780 | So it's kind of like a public service.
00:12:48.780 | You know what?
00:12:49.780 | I'm kind of almost tempted myself.
00:12:50.780 | So it's probably time to draw this video to an end.
00:12:54.780 | Thank you so much for watching to the end and have a wonderful day.