Lines of code got a better publicist

410 points · 286 comments on HN · read original →

Points and comments are a snapshot, not live.

AI productivity claims shifted from outcome metrics to vanity volume metrics, obscuring evidence of actual impact.

Tech vendors (Google, Anthropic, OpenAI, Cursor) now publicize lines of code or percentage of code written by AI. This mirrors the discredited practice of measuring developer productivity by LOC. Earlier AI claims focused on outcomes: GitHub's Copilot study cited 55% task completion speedup. Recent research shows mixed results. Cui et al. found 26% task completion gains across 5,000 developers. GitClear reported code churn rising and refactoring collapsing with Copilot adoption. A 2025 METR study found experienced developers 19% slower with AI, though a 2026 follow-up reversed this with wider error margins and admitted measurement limitations.

At organizational level, an NBER survey of 6,000 executives found 69% using AI but 90% reporting no measurable productivity impact. Industry consensus sits around 10% organizational gains. High-profile layoffs (Block cut 40% of staff, Atlassian cut 10%) cited AI productivity gains despite no evidence of actual workforce idleness. The author argues these volume metrics are unfalsifiable and serve marketing rather than measurement, and advocates returning to battle-tested measures: DORA metrics, reliability, meaningful change rate, and revenue.

What commenters are saying

Top comment notes the circularity: 219 engineering leaders gave 219 different definitions of "AI-native engineering," undermining standardized measurement. A second camp points to Anthropic's contradiction: their marketing claims 8x code shipped while their own research found 17% lower code comprehension with no significant productivity gain. Several commenters cite OpenAI's blog post about million-LOC agent-generated code with no description of actual value or purpose, comparing unfavorably to the Linux kernel's 40M LOC. Skepticism about HN astroturfing and Anthropic hype appears in mid-ranked comments. One commenter notes "slop" resonates better than "technical debt" in conveying the scale and quality issues of AI-generated code.