AI failed at 96% of jobs.

A new report from Scale AI and the Center for AI Safety tested AI agents against 240 real freelance projects from Upwork. The best agent completed only 2.5% ...

2 min read LinkedIn
AI failed at 96% of jobs.

A new report from Scale AI and the Center for AI Safety tested AI agents against 240 real freelance projects from Upwork. The best agent completed only 2.5% of them to acceptable quality.

The report: https://lnkd.in/gnSb7MPJ

Meanwhile, on the same feed, founders are claiming AI replaced half their team.

Same AI models. Opposite conclusions.

My personal take is the researchers may not have the domain expertise to automate the work they were testing. I know some tech-savvy entrepreneurs (including myself) who are already automating some work that, in the past, would have required hiring people.

The difference has nothing to do with AI capability.

It’s a difference in mental model when using AI.

One group expects AI to be an expert of all trades.

They give AI the brief, expect senior-level output, and dismiss it when the result is average.

“See? AI is overhyped.”

The other group treats AI like a junior employee.

They know LLMs are an averaging technology - the output converges to the statistical mean of its training data. By design. This means average output.

So they don’t expect expert output out of the box.

They invest in “training” the junior.

Curate skill files written by domain experts.

Build workflows based on domain knowledge.

Write detailed specifications.

Review, iterate, refine.

That’s when the output stops looking “average.”

I’ve been both groups.

When I first started building apps with Claude Code, I threw a PRD at it and got demos that broke in production. Classic Group 1.

3 production apps later, I now write detailed project files, create expert skill specifications, and treat every AI session like onboarding a new hire.

Same tool. Completely different output.

This pattern holds across functions:

  • A marketer who loads brand voice and campaign references gets strategic output. One who types “write me an ad” gets slop.

  • A developer who gives detailed specs and test harnesses gets production code. One who says “build me an app” gets a demo.

The difference is never the AI.

Next time an AI tool disappoints you, check your workflow. Not the model.

LLMs produce average output by design. The expert’s job is to guide it to raise the ceiling.

#AI #AIAgents #LLM #ExpertInTheLoop

Enjoyed this? Subscribe for more.

Practical insights on AI, growth, and independent learning. No spam.

More in Tech & Startup

Our team is ready to kickstart the Business Plus AI Forum at CIFTIS Beijing 2025, in collaboration with Beijing Association for Trade in Services and World Trade Centers Association Beijing!

Our team is ready to kickstart the Business Plus AI Forum at CIFTIS Beijing 2025, in collaboration with Beijing Association for Trade in Services and World Trade Centers Association Beijing!

From rehearsals to the grand stage, everything is set for an exciting exchange of ideas on how AI is transforming industries and driving global collaboration...

AI StrategyAI MarketingTech & Startup
Three Stories of a Technopreneur

Three Stories of a Technopreneur

I have done many workshops and professional sharing sessions. This was the first time I shared personal stories.

Tech & StartupIndependent LearningAI Strategy
Your AI Idea Must Justify Business ROI

Your AI Idea Must Justify Business ROI

Sometimes, proof of concept (POC) feels magical, but the real test is whether it can be deployed sustainably.

AI StrategyTech & Startup
11 Frameworks Every Marketer Should Know

11 Frameworks Every Marketer Should Know

I've seen marketers drown in tactics while missing the fundamentals.

AI MarketingAI StrategyIndependent LearningTech & Startup
Business Plus AI Forum @ Beijing CIFTIS 2025 | Global Online Free Live Stream

Business Plus AI Forum @ Beijing CIFTIS 2025 | Global Online Free Live Stream

We are proud to share that we are bringing Business Plus AI Forum to CIFTIS 2025, one of the largest trade fairs in China. Join us on Saturday, September 13,...

AI StrategyTech & StartupAI Marketing
I am honored to be featured by e27 (Optimatic), one of the top startup media and communities in the...

I am honored to be featured by e27 (Optimatic), one of the top startup media and communities in the...

"You can't connect the dots looking forward; you can only connect them looking backwards. So you have to trust that the dots will somehow connect in your fut...

Tech & Startup