DeepSeek V4 Pro beats GPT-5.5 Pro on precision

RuntimeWire (2026-06-08) · On Hacker News (2026-06-08)

344 points · 175 comments on HN · read original →

Points and comments are a snapshot, not live.

DeepSeek V4 Pro outscored GPT-5.5 Pro on a four-task benchmark evaluated by an AI judge.

The article reports that DeepSeek V4 Pro scored 38.0 to OpenAI's GPT-5.5 Pro's 33.0 across four freshly generated text tasks evaluated by grok-4-1-fast-non-reasoning. The tasks were created on the fly to prevent advance preparation by either model. No additional details about the tasks, methodology, or domains are provided in the available content.

What commenters are saying

The thread's top comments zeroed in on methodological weakness: only four tasks judged by a single model (grok-4-1-fast, which was retired a month prior and now routes to a different system), yielding small sample size and unreliable conclusions. Separately, a developer shared concrete cost data from a vulnerability scanning benchmark showing DeepSeek V4 Pro at roughly $0.10 per case versus GPT-5.5 Pro at $22 per case; GPT-5.5 Pro hit budget limits before completion while DeepSeek and other cheaper models found comparable or better results. The conversation pivoted to broader concerns: frontier labs' resistance to price cuts due to enterprise lock-in, data privacy worries with US and Chinese providers, and whether open-weight models should be self-hosted. One commenter noted the article's language reads as AI-generated.

No clear consensus emerged on whether the benchmark finding itself was meaningful given its scale, but cost and performance trade-offs dominated the discussion.