Running local models is good now

1456 points · 558 comments on HN · read original →

Points and comments are a snapshot, not live.

Local models now achieve ~75% of frontier model speed and accuracy for agentic coding tasks.

The author, using a 2022 M2 Mac with 64GB RAM, finds local models like Gemma-4-12b-qat finally capable of agentic coding, refactoring Python, writing unit tests, and bootstrapping repos. They note significant improvement over 6 months ago, when such tasks were impossible locally. They share a setup using Pi agent harness and LM Studio inference server, with Docker for security, and highlight benefits like low cost, full introspection of token processing, and ability to tweak parameters, though readiness for production is uncertain.

What commenters are saying

The thread splits into two camps: one affirming the value of local models for pros willing to invest in hardware (a 64GB Mac is ~$2k used), and another arguing the barrier remains high for most global earners. Commenters pushing back note that even 12GB VRAM can run capable models like Gemma 4 or Qwen 3.5 MoE at 30-40 tokens/s, and that cloud subscriptions remain cheaper and easier for many. Some doubt a mass migration from hosted AI will occur, citing the long trend toward outsourcing infrastructure. A known comparative benchmark site (llmcheck.net) is shared.