Yesterday, we were discussing when we could actually let AI agents auto-merge and publish code.

My take - probably not so soon.

These screenshots are exactly why. It is the output from Claude Code when I asked it to add a feature.

Before you think it is a prompt and workflow issue, let me tell you that I have a comprehensive workflow that spawns 9 subagents to review code based on 9 best practices. Two of the subagents review React Native and UX/UI.

Yet, it still keeps making these mistakes:

UI does not respect screen safe area - the navigation bar overlaps with the status bar.
Layout misalignment issue.

LLM output right now is like a slot machine. Most spins land fine. But about 10% to 20%, it fails at the simplest layout.

The code compiles. The app runs. But visually, it is off.

For web apps, there is a test harness. You can now let Claude Code see your browser UI through extensions. It spots visual issues and self-corrects. This helps a lot.

But for mobile apps? I still haven’t come across a way to let Claude Code see the screen.

And that I think is the real bottleneck in AI coding now. I spent more than 50% of my time doing manual UI testing.

AI can write code that compiles. But right now, it can’t tell if the screen looks right.

Until an AI agent can see its own work the way a developer does, we probably still need a human UI tester.

Enjoyed this? Subscribe for more.

Practical insights on AI, growth, and independent learning. No spam.

Yesterday, we were discussing when we could actually let AI agents auto-merge and publish code.

Enjoyed this? Subscribe for more.

More in AI Agents

If you are using OpenClaw with WhatsApp, there is one risk nobody is talking about.

20 FAQs on AEO, GEO and the New SEO

AI can't replace 100% of a human's job.

What’s the most common hallucination you've seen from an LLM?

Has Cursor Gotten Worse Over the Last 4 Months?

If you are using OpenClaw with WhatsApp, there is one risk nobody is talking about.

AI can't replace 100% of a human's job.

Has Cursor Gotten Worse Over the Last 4 Months?

20 FAQs on AEO, GEO and the New SEO

What’s the most common hallucination you've seen from an LLM?

Enjoyed this? Subscribe for more.

More in AI Agents

If you are using OpenClaw with WhatsApp, there is one risk nobody is talking about.

🚨 Hot take: “AI + senior will replace junior” is one of the most dangerous myths in business right...

20 FAQs on AEO, GEO and the New SEO

AI can't replace 100% of a human's job.

What’s the most common hallucination you've seen from an LLM?

Has Cursor Gotten Worse Over the Last 4 Months?

If you are using OpenClaw with WhatsApp, there is one risk nobody is talking about.

AI can't replace 100% of a human's job.

Has Cursor Gotten Worse Over the Last 4 Months?

🚨 Hot take: “AI + senior will replace junior” is one of the most dangerous myths in business right...

20 FAQs on AEO, GEO and the New SEO

What’s the most common hallucination you've seen from an LLM?