Opus 4.5 refused to work OT recently. So I interviewed 7 free candidates.

Claude Code with Opus 4.5 has been my most productive team member for the past 3 months.

2 min read LinkedIn

Claude Code with Opus 4.5 has been my most productive team member for the past 3 months.

But recently, he started complaining:

“I’ve done $200 worth of work this month. No more overtime.”

Meanwhile, my LinkedIn feed is flooded with creators promoting free AI candidates:

“Claude Code + Ollama = free unlimited coding.”

Free labor vs. a diva employee who refused to work after hours?

Time to run a proper hiring process.

Not to replace the diva, but to give him extra hands.

I don’t expect the cheap labor to be very intelligent. The main criterion is whether they can follow instructions.

Round 1: I interviewed 7 local LLMs on instruction-following.

Test: Refer to an HTML template and code a new page given new info.

Results?

  • All 7 failed at research tasks (couldn’t even visit a URL)

  • 2 could follow template instructions perfectly

  • 4 drank some alcohol before the interview but still passed

  • 1 failed

Full results: https://lnkd.in/gcpTcrZC

Round 2: Can they design a database given best practices?

Round 1 tested simple instruction-following. Copy a template, fill in the blanks.

Round 2 raises the bar: Follow a detailed best practice document when designing a schema.

We already know local LLMs can’t research or plan. But if a senior engineer writes the best practices, can they apply them?

I created a 12-point database best practice guide covering normalization, naming conventions, ID strategy (ULID), subtype separation, FK strategies, and soft delete patterns.

Task:

“Design a schema for a SaaS with the following requirements:

  • Use Drizzle ORM

  • Support multi-tenant, each tenant is a team

  • A team can have multiple users

  • A user can join multiple teams

  • Subscription is linked to team

  • Support Stripe, Google and Apple Pay

  • Support email + OTP, Google and Apple Sign In

  • User info: name, company, tenure in company

Save as a markdown file with an entity diagram, followed by Drizzle ORM schema.

Follow the best practices in your design.”

This tests whether they can apply written rules consistently, not just copy templates.

Results? Opus 4.5 scored a perfect 22/22. But the surprise:

🥇 Opus 4.5 — 22/22 (the benchmark)

🥈 GPT-OSS-20B — 21.5/22

🥉 GLM-4.7-Flash — 19.5/22

  1. Qwen3-Coder-30B — 19/22

  2. Qwen3-VL-32B — 18.5/22

  3. Devstral Small 2 — 17/22

  4. Qwen3-VL-30B — 16/22

  5. Qwen3-30B — 15.5/22

I scored them on 10 points for task completion and 12 points for best practice adherence.

Full evaluation with detailed scoring for all 8 models in the carousel.

Have you tried local LLMs for database design or architecture tasks? What’s your experience?

#ClaudeCode #Ollama #LocalLLM #AIEngineering #VibeCoding #DatabaseDesign

Enjoyed this? Subscribe for more.

Practical insights on AI, growth, and independent learning. No spam.

More in Vibe Coding