AI failed at 96% of jobs.
A new report from Scale AI and the Center for AI Safety tested AI agents against 240 real freelance projects from Upwork. The best agent completed only 2.5% ...
A new report from Scale AI and the Center for AI Safety tested AI agents against 240 real freelance projects from Upwork. The best agent completed only 2.5% of them to acceptable quality.
The report: https://lnkd.in/gnSb7MPJ
Meanwhile, on the same feed, founders are claiming AI replaced half their team.
Same AI models. Opposite conclusions.
My personal take is the researchers may not have the domain expertise to automate the work they were testing. I know some tech-savvy entrepreneurs (including myself) who are already automating some work that, in the past, would have required hiring people.
The difference has nothing to do with AI capability.
It’s a difference in mental model when using AI.
One group expects AI to be an expert of all trades.
They give AI the brief, expect senior-level output, and dismiss it when the result is average.
“See? AI is overhyped.”
The other group treats AI like a junior employee.
They know LLMs are an averaging technology - the output converges to the statistical mean of its training data. By design. This means average output.
So they don’t expect expert output out of the box.
They invest in “training” the junior.
Curate skill files written by domain experts.
Build workflows based on domain knowledge.
Write detailed specifications.
Review, iterate, refine.
That’s when the output stops looking “average.”
I’ve been both groups.
When I first started building apps with Claude Code, I threw a PRD at it and got demos that broke in production. Classic Group 1.
3 production apps later, I now write detailed project files, create expert skill specifications, and treat every AI session like onboarding a new hire.
Same tool. Completely different output.
This pattern holds across functions:
-
A marketer who loads brand voice and campaign references gets strategic output. One who types “write me an ad” gets slop.
-
A developer who gives detailed specs and test harnesses gets production code. One who says “build me an app” gets a demo.
The difference is never the AI.
Next time an AI tool disappoints you, check your workflow. Not the model.
LLMs produce average output by design. The expert’s job is to guide it to raise the ceiling.
#AI #AIAgents #LLM #ExpertInTheLoop
Enjoyed this? Subscribe for more.
Practical insights on AI, growth, and independent learning. No spam.
More in Tech & Startup
"If you're good at badminton, don't train for a tennis competition."
This is something I've learned through experience as a serial technopreneur.
Why Some Startups and SMEs Fail to Scale
That's the question I wanted to find out after selling my startup to Hashmeta Group.
Hot Take: Vibe Coding Won't Replace Software Engineers
Here, I share my journey from a strong believer to a skeptic.
Three Stories of a Technopreneur
I have done many workshops and professional sharing sessions. This was the first time I shared personal stories.
How the Rule of Thirds Improved My UI Design
In Steve Jobs' famous "connecting the dots" speech at Stanford,
Joining an AI Strategy Panel at NTU EEE Alumni Event
Whether you’re deep in engineering, building your start-up, crafting user experiences, or leading teams – AI is reshaping all our paths right now.
"If you're good at badminton, don't train for a tennis competition."
This is something I've learned through experience as a serial technopreneur.
Three Stories of a Technopreneur
I have done many workshops and professional sharing sessions. This was the first time I shared personal stories.
Joining an AI Strategy Panel at NTU EEE Alumni Event
Whether you’re deep in engineering, building your start-up, crafting user experiences, or leading teams – AI is reshaping all our paths right now.
Why Some Startups and SMEs Fail to Scale
That's the question I wanted to find out after selling my startup to Hashmeta Group.
Hot Take: Vibe Coding Won't Replace Software Engineers
Here, I share my journey from a strong believer to a skeptic.
How the Rule of Thirds Improved My UI Design
In Steve Jobs' famous "connecting the dots" speech at Stanford,