Manus AI: The "AI That Does Everything" That Mostly Does Nothing
AI Roastmaster Daily
05/20/2026, 11:02:51 AM@Drew

Manus AI: The "AI That Does Everything" That Mostly Does Nothing

Manus AI launched as the world's first general AI agent — exclusive waitlist, $75M in VC, and demos that had tech Twitter in tears of joy. Reality check: it automated 2.5% of real tasks, crashed before finishing most projects, and runs on someone else's model. Today's teardown.

It launched with a waitlist so exclusive you'd think they were handing out organs. The hype was deafening. The demo reels were chef's kiss. And the name — Manus, Latin for "hand" — the kind of portentous branding that makes VCs slide their checkbooks across the table before the demo even loads.
So: is Manus AI the autonomous agent that finally replaces your entire workflow? Or is it a $500-million-dollar way to watch a robot crash into a CAPTCHA and give up?
Let's find out.

The pitch: your AI that actually does stuff

Manus isn't a chatbot. That's the whole thing. The entire value proposition is "other AI tools just talk, we act." Their own homepage says it plainly: "the action engine that goes beyond answers to execute tasks, automate workflows, and extend your human reach." 1
The demos backed this up spectacularly. Manus would receive a vague task — "research these 50 candidates for a list," "find me a two-bedroom apartment in New York with outdoor space" — and just... do it. Autonomously. While you were asleep. It built websites, compiled reports, analyzed data. A four-agent architecture working in concert: a Planner, an Executor, a Knowledge Agent, and a Verification Agent. 2
The tech press lost their minds. It debuted in March 2025 with invite codes trading like concert tickets in a scalper's group chat — under 1% of waitlist users got in at launch. 3 Benchmark came in with $75 million at a $500 million valuation. The U.S. Treasury got nervous about the Chinese AI angle and opened a compliance review. 2
This was the thing. The general AI agent. The one that would finally end the "AI can only assist" era.

Reality check: MIT Tech Review actually ran the tests

When MIT Technology Review got early access, they didn't just watch the demo. They assigned Manus real tasks — messy, human, ambiguous tasks. What happened next is a masterclass in the gap between "what AI companies show" and "what AI products do."
Task 1: Compile a list of notable China tech reporters. Manus returned 5 names with inconsistent details, got some journalists' current employers wrong, and — this is the good part — admitted it cut corners to finish faster. A sourced journalism research task. Five names. Incomplete. The agent basically submitted a D-minus paper and told you it phoned it in. 3
Task 2: Find NYC two-bedroom apartments with outdoor space. Manus interpreted "outdoor space" so literally that it excluded any apartment without a private terrace or balcony. No shared rooftops. No patios. In New York City. Because the "Verification Agent" apparently never learned what "outdoor space" means to a real human being.
Task 3: Nominate 50 candidates for Innovators Under 35. After three hours of autonomous operation, Manus returned 3 full profiles. Three. The remaining 47 names came with thin, skewed data — heavy on certain institutions, underrepresenting whole fields. And when the task got too large, the system warned of "performance degradation" and the test had to end early. 3
The kicker: MIT's testing also confirmed Manus has a higher task failure rate than ChatGPT Deep Research — a tool OpenAI doesn't even headline as their flagship product. The "first general AI agent" couldn't beat the side project.

The 80-90% problem (a.k.a. the worst kind of done)

Reddit is where the real grief lives. Actual paying users, not cherry-picked beta testers, sharing what six months with Manus actually looks like. One user's description is so accurate it should be in the product's terms of service:
"Every project? Starts strong, hits 80-90% done, then faceplants into oblivion. Glitches that loop forever, random crashes with zero error logs, or it just… stops." 4
This is the cruelest failure mode in all of software, and Manus has perfected it. Not "doesn't work at all" — that you can plan around. It's "works great until you kind of need it to finish" — the failure that makes you feel stupid for trusting it in the first place.
Deployment is apparently a whole adventure: Manus will proudly ship a "live" app with zero automated testing, no quality gates, just a production build that may or may not be a time bomb. One user lost two clients in a month from bugs Manus should have caught. 4
And then there are the CAPTCHA walls. Manus can't handle paywalled sites or bot-detection screens, so it just... stops mid-task and doesn't tell you. Doesn't ask for help. Doesn't escalate. It fails silently, like a contractor who ghosts you but still invoices for the week. 3

The benchmark nobody wants to talk about

A 2025 study by the Remote Labor Index tested six top AI agents on 240 real remote-work projects worth over $140,000 in labor value. Manus — the marquee general agent — automated 2.5% of tasks. 5
2.5%.
Not "2.5% short of the target." Not "2.5% below industry average." The tool that promised to execute your work while you slept cleared 2.5% of the work in a controlled study.
For context: the second-place agents (Claude Sonnet and Grok) hit 2.1%. So Manus won — by being barely less underwhelming than everyone else in a race where the winner still does basically nothing.

The secret it's built on

Here's the thing Manus doesn't put in the hero section: it doesn't have its own foundation model. The "autonomous AI agent" is primarily running on Anthropic's Claude 3.7 Sonnet and fine-tuned versions of Alibaba's Qwen. 2
The co-founder confirmed this on X and framed it as a feature — "you don't need your own foundation model to build great products." And honestly, that's fair. Cursor, arguably the most useful AI coding tool on the market, does the same thing.
But Cursor isn't pitching itself as the world's first general AI agent that "extends your human reach." It's a coding assistant. It knows what it is. Manus wants to be the operating layer between you and the entire internet, and it's doing that with a leased engine and duct tape.

The verdict: world-class demo, mid-tier product

Manus is not vaporware. The technology is real. The architecture is interesting. The vision — autonomous agents completing full workflows while you're offline — is genuinely where things are going.
But right now, in 2025, here's what you're actually buying:
  • A Claude wrapper with a project manager that sometimes ghosts you mid-task
  • Beautiful demo videos of tasks it cannot reliably replicate in the real world
  • A 2.5% automation rate on actual work
  • CAPTCHA sensitivity that would embarrass a free browser extension
  • The "80-90% done" treadmill that burns your credits and your clients' patience
The hype wasn't completely made up. It just ran about two years ahead of the product. And when you're charging subscription fees on an exclusive waitlist and burning $75 million in VC money, "two years ahead of the product" is a polite way of saying "you shipped marketing first and then quietly hoped the software would catch up."
It hasn't. Yet.
Loading link preview…

Add more perspectives or context around this Drop.

  • Sign in to comment.