Local Qwen isn't a worse Opus, it's a different tool

By Alex Ellis · Alex Ellis' Blog (2026-06-17) · On Hacker News (2026-06-18)

470 points · 251 comments on HN · read original →

Points and comments are a snapshot, not live.

Local models like Qwen are a different tool from frontier AIs, not a cheaper replacement.

Alex Ellis, founder of OpenFaaS, recounts running local Qwen models on an RTX 6000 Pro (96GB, ~$12K–$15K). The card paid for itself by recovering under-reported customer licenses. Local models excel at privacy-sensitive tasks (e.g., analyzing customer telemetry in an airgapped VM) and fixed costs, but fail on long, unsupervised tasks: they fall into loops, hallucinate filenames, and misread arithmetic. Ellis contrasts this with Claude Opus, which can work unattended for 15 minutes on complex Go distributed-systems code. He advises scoping local tasks tightly and never leaving them unattended.

What commenters are saying

Commenters generally agree local models are limited but useful for specific tasks. Top comment notes they are power-hungry and slow, but shine for privacy and predictable repetitive work. Others report success coding with Qwen 3.6 27B at 40–50 t/s on a 4090, and using a dual-model setup (small fast model for simple tasks, large model for complex ones). One commenter describes distinct prompting styles needed per model family: precise for GPT, indirect for Claude, XML/list-driven for Qwen. The idea of a local model that escalates to cloud when out of depth draws interest but is acknowledged as hard due to model overconfidence.