Hot Take: Vibe Coding Won't Replace Software Engineers

Here, I share my journey from a strong believer to a skeptic.

Anyone who claims otherwise is either:

Has access to a powerful LLM not yet released
Is marketing a vibe coding platform
Is at The Peak of Inflated Expectation of hype cycle
Confuses software engineers with coders

TLDR

Vibe coding is good for simple apps that are mostly UI and CRUD via API.
It still can’t handle big tasks.
You need a software engineer to break tasks into bite-size and orchestrate the agents.
It still falls short on basic business logic.
You need a PM to test and review the business logic.

If you think this is another denial post by a software engineer, I invite you to read on to reassess your opinion.

Why Should You Trust Me?

Don’t get me wrong, this post is not to discredit the value of vibe coding. I am paying Claude Code Max 20x for S$300 per month and will continue to pay.

It is to share reality based on my recent experience with Claude Code, as well as insights I’ve received from others I trust.

My background:

I have built production-grade software for 19 years as a technopreneur.
Coach software engineers for clean code, system architecture, database design and more.
Have used Cursor for more than a year on existing projects.
Have used Claude Code to build 3 mobile apps with production-grade backend recently.

My Hypothesis

As a technopreneur, I always have a lot of ideas. My bottleneck has always been which idea is worth building to test market demand. There is a huge opportunity cost for each idea you build to test - from product development to marketing.

The rise of GenAI in recent years is a golden window of opportunity for any entrepreneur:

It is a new enabling technology that cuts across many industries and use cases, giving opportunity to transform many industries.
The promise of vibe coding means the opportunity cost of building will drop drastically.
The power of AI agents means you just need a lean team to make a business work, drastically reducing the risk.

I know vibe coding has its limitations out-of-the-box:

LLM is an averaging technology.
It produces average code by design.
The bottleneck will be in code review.

But I have a hypothesis:

LLM has expert knowledge in their training data that can be unlocked with the right prompt.
If I could give it best practices to follow, I could unlock the power of Claude Code.

My grand plan? Use Claude Code to churn out many apps that I could test the market.

The Experiment: 3 Apps in 2 Months

So, I started a new venture 2 months ago. I started by developing apps that I need myself as a parent. Check my profile for what I am building.

The results? I managed to build 3 mobile apps in 2 months. 3 simple apps, but with production-grade backend.

And my honest take on vibe coding? If it could replace software engineers, I would have done it in 1 month.

This is what actually happened.

App 1: Spell Parrot (Expected 2 weeks → Actual 3 weeks)

A simple app for parents to snap a photo of their kids’ spelling list. Then kids can practice their spelling without needing a parent to sit side-by-side to read.

Not too bad, but here’s what happened:

I wanted to make sure my app is built on scalable architecture in case it goes viral.
This means it shouldn’t be a POC.
It should use components like serverless, queue, pub/sub, object storage, CDN, etc.
Claude Code is not good at system architecture out-of-the-box.
So I ended up spending 12 hours a day, 7 days a week, pair programming with Claude Code.
I used this as a chance to see where Claude Code produces suboptimal code to create best practice guides.
I ended up with 7 best practice guides that cover: Coding, database, GraphQL, React Native, security, testing, UX.

App 2: AI Parrot (Expected 1 week → Actual 3 days… then 10+ days of bug fixing)

A ChatGPT for Kids. No, I don’t train my own LLM. It is multi-agent orchestration with the right prompts to guide AI to give guiding responses to kids, instead of direct answers. I built it so that I can expose my kids to AI, without worrying they just copy answers.

This was where the magic happened.

I spent 1 day discussing the PRD with Claude Code.

Then I asked Claude Code to:

Follow the 7 best practice guides.
Refer to code in Spell Parrot.
Implement AI Parrot based on the PRD.

Claude Code finished the implementation over a dinner.

I compiled it and did a quick check. Wow. The UI is decent and looks 80% usable. I spent the next 2 days doing manual UI testing and tidying up some minor bugs. This was when I was at The Peak of Inflated Expectation in the hype cycle. I thought, my hypothesis was right!

Then reality hit: Subscription.

I launched Spell Parrot as a free app, since it is so simple that I am not sure if anyone is willing to pay. And the cost of running it is very low for me.

For AI Parrot, the costs of LLM API scared me when I ran the financial modelling. So, I decided to at least introduce subscription as early as possible.

I thought subscription is a very common feature for most apps. LLM being an averaging technology, it should know how to code subscription easily. I expected Claude Code to finish the feature in 1 day.

The results? 10+ days of bug fixing.

To summarize, for mobile apps, we always need to implement subscription for 3 providers - Apple, Google and Stripe (for web). I spent a day doing manual testing, only to realize Claude Code used poor database design and webhook architecture. Spent a couple of days doing pair programming with Claude Code to craft best practice for subscription and webhook. Spent a few more days doing manual testing, only to realize Claude Code hallucinated on refactoring. It told me it refactored the code based on my best practice on subscription and webhook. Added many comments in the code. But I noticed it didn’t change some of the actual code.

Article content

Claude Code wrote comments as if the function was implemented.

I wouldn’t go into the technical details. Let me know if you want to know the details.

I thought. Well. Maybe subscription is just an isolated case where LLM couldn’t excel.

App 3: Grammar Parrot (Expected 3 days → Actual 2 weeks)

This is an app I want to build to allow my kids to learn and practice grammar in a smarter way.

I spent 1 day discussing the PRD with Claude Code.

Then I asked Claude Code to:

Follow the 9 best practice guides.
Refer to code in AI Parrot.
Implement Grammar Parrot based on the PRD, with subscription.

To me, any software engineer, given these 3 things, will be able to deliver the app 90% ready, albeit needing some time to code. If vibe coding can replace software engineers, I would expect this to be done in 3 days - a few hours to code, 2 to 3 days testing.

Yet, I ended up spending 2 weeks.

This is where many of Claude Code’s limitations started to be exposed.

The 3 Major Limitations of Claude Code

Limitation 1: Poor Instruction Following

It started to violate best practices.
It didn’t strictly reuse code in AI Parrot.
It hallucinated new code that violated best practices.

Claude Code can’t actually follow your best practice guides closely

Limitation 2: Struggles with Basic Logic

This is just one example.

One of the features of my app is to create mixed topic assessments.
I expected the logic to select questions from multiple topics with a random distribution.
When I tested the app, it only selected questions from 1 topic.
It turns out that Claude Code’s logic was to use SELECT … WHERE topic_id IN (x, y, z) LIMIT 20… then shuffle it.
A software engineer will quickly see the problem - this is not random but based on records order in database.
I gave Claude Code a chance to fix it. It failed to fix it after trying for 40 mins.

Claude Code can’t catch this logic bug

Limitation 3: Claims Completion Prematurely (The Most Fatal One)

It is quite common for Claude Code to claim it finished all the tasks in a plan, when it hasn’t really finished. Or worse, it finished with shortcuts. Some examples:

It claimed to have finished implementing all the features in the PRD. But when tested, half of the features were missing.
I asked it to use subagents to craft the seed content for 18 topics. It spawned 18 subagents, worked for an hour, claimed to finish the tasks. But the content was “coming soon” for 14 out of 18 topics. I repeated this 3 times to get the job done.I caught Claude Code taking shortcut

I caught Claude Code taking shortcut

Claude Code completed 14 out of 18 tasks by taking shortcut

Conclusion

So, can vibe coding replace software engineers?

Not yet. And maybe not ever.

Here’s what I learned from 2 months of intensive vibe coding:

(1) Vibe coding is a powerful accelerator, not a replacement.

It can turn a 6-month project into 2 months. But to create a usable app in 1 day? That is hype.

(2) You still need a software engineer to:

Design scalable architecture
Break complex tasks into manageable pieces
Review and catch logic errors
Verify that the AI actually did what it claimed

(3) You still need a PM to:

Test business logic thoroughly
Catch missing features
Validate against requirements

(4) The “80% done” trap is real.

Claude Code can get you to 80% fast. But that last 20% - the edge cases, the business logic, the integration quirks - is where software engineering experience matters most.

(5) AI can replace coders, not software engineers.

What’s the difference? A coder translates requirements into code. A software engineer designs systems, makes architectural decisions, anticipates edge cases, and ensures the code is maintainable, scalable, and correct. AI is great at the former. It struggles with the latter.

Will I stop using Claude Code? Absolutely not. I’m still paying S$300/month and will continue to do so.

But I’ve adjusted my expectations.

Vibe coding is my copilot, not my replacement. It helps me code faster, but it doesn’t help me think less.

What’s your experience with vibe coding? Am I missing something, or does this resonate with you?

If you found this useful, follow me for more honest takes on AI and software development.