AI failed at 96% of jobs.
A new report from Scale AI and the Center for AI Safety tested AI agents against 240 real freelance projects from Upwork. The best agent completed only 2.5% ...
A new report from Scale AI and the Center for AI Safety tested AI agents against 240 real freelance projects from Upwork. The best agent completed only 2.5% of them to acceptable quality.
The report: https://lnkd.in/gnSb7MPJ
Meanwhile, on the same feed, founders are claiming AI replaced half their team.
Same AI models. Opposite conclusions.
My personal take is the researchers may not have the domain expertise to automate the work they were testing. I know some tech-savvy entrepreneurs (including myself) who are already automating some work that, in the past, would have required hiring people.
The difference has nothing to do with AI capability.
It’s a difference in mental model when using AI.
One group expects AI to be an expert of all trades.
They give AI the brief, expect senior-level output, and dismiss it when the result is average.
“See? AI is overhyped.”
The other group treats AI like a junior employee.
They know LLMs are an averaging technology - the output converges to the statistical mean of its training data. By design. This means average output.
So they don’t expect expert output out of the box.
They invest in “training” the junior.
Curate skill files written by domain experts.
Build workflows based on domain knowledge.
Write detailed specifications.
Review, iterate, refine.
That’s when the output stops looking “average.”
I’ve been both groups.
When I first started building apps with Claude Code, I threw a PRD at it and got demos that broke in production. Classic Group 1.
3 production apps later, I now write detailed project files, create expert skill specifications, and treat every AI session like onboarding a new hire.
Same tool. Completely different output.
This pattern holds across functions:
-
A marketer who loads brand voice and campaign references gets strategic output. One who types “write me an ad” gets slop.
-
A developer who gives detailed specs and test harnesses gets production code. One who says “build me an app” gets a demo.
The difference is never the AI.
Next time an AI tool disappoints you, check your workflow. Not the model.
LLMs produce average output by design. The expert’s job is to guide it to raise the ceiling.
#AI #AIAgents #LLM #ExpertInTheLoop
Enjoyed this? Subscribe for more.
Practical insights on AI, growth, and independent learning. No spam.
More in Tech & Startup
Forget Pain Points: Think Convenience
This advice from Ev Williams, co-founder of Blogger, Twitter and Medium should serve as a signpost.
Cursor's Pricing Changes Caused an Uproar
They have to do it because subsidizing the market with cheap tokens is not sustainable in the long run.
Announcing My New Role: Founder at Learn Parrot
Your AI Idea Must Justify Business ROI
Sometimes, proof of concept (POC) feels magical, but the real test is whether it can be deployed sustainably.
"If you're good at badminton, don't train for a tennis competition."
This is something I've learned through experience as a serial technopreneur.
Joining an AI Strategy Panel at NTU EEE Alumni Event
Whether you’re deep in engineering, building your start-up, crafting user experiences, or leading teams – AI is reshaping all our paths right now.
Forget Pain Points: Think Convenience
This advice from Ev Williams, co-founder of Blogger, Twitter and Medium should serve as a signpost.
Announcing My New Role: Founder at Learn Parrot
"If you're good at badminton, don't train for a tennis competition."
This is something I've learned through experience as a serial technopreneur.
Cursor's Pricing Changes Caused an Uproar
They have to do it because subsidizing the market with cheap tokens is not sustainable in the long run.
Your AI Idea Must Justify Business ROI
Sometimes, proof of concept (POC) feels magical, but the real test is whether it can be deployed sustainably.
Joining an AI Strategy Panel at NTU EEE Alumni Event
Whether you’re deep in engineering, building your start-up, crafting user experiences, or leading teams – AI is reshaping all our paths right now.