1-Bit Bonsai Image 4B Image Generation for Local Devices

PrismML (2026-05-26) · On Hacker News (2026-06-01)

442 points · 189 comments on HN · read original →

Points and comments are a snapshot, not live.

PrismML releases Bonsai Image 4B, a compressed image generation model running on iPhones and local devices with 6.4x to 8.3x smaller transformer footprint.

Bonsai Image 4B comprises two variants: a 1-bit model using binary weights achieving 0.93 GB transformer size (8.3x reduction from FLUX.2 Klein 4B's 7.75 GB), and a ternary model using three-state weights at 1.21 GB (6.4x reduction). Both retain group-wise FP16 scaling for precision-sensitive layers. On iPhone 17 Pro Max, the models generate 512x512 images in 9.4 seconds; on Mac M4 Pro, up to 5.6x faster than full-precision FLUX. Across three benchmarks (GenEval, HPSv3, DPG-Bench), ternary variant retains 95% of FLUX.2 Klein accuracy while 1-bit achieves 88%. Total deployment payload on Apple Silicon is 3.42 GB (1-bit) and 3.88 GB (ternary) versus 15.97 GB for full-precision. Models release under Apache 2.0 with open weights; Bonsai Studio iOS app and WebGPU demo available.

What commenters are saying

Top comment notes FLUX.2 is technically a rectified flow model, not diffusion, though commenters accept "diffusion" as umbrella terminology. Broader thread pivots to discussing local inference economics: one user reports running five agents on a $3k Asus GB10 for 394M input tokens over 30 days, spending roughly $15-35 on electricity versus $1600-1700 via API, with GitHub Copilot's $39/month subscription providing surprising token value for auxiliary tasks. Another commenter argues local hardware cannot compete with datacenter economics at scale due to pooled resource utilization. Thread also surfaces Taalas, an ASIC-based inference startup claiming 17k tokens/second, though project appears dormant since launch.