Every Byte Matters

255 points · 134 comments on HN · read original →

Data layout drastically affects performance: struct-of-arrays can beat array-of-structs by 30x in sequential access, and total working set size determines random-access latency.

Modern CPUs fetch 64-byte cache lines when accessing a single byte. If a struct is 64 bytes and contains only one needed field, fetching one object pulls in wasted data. A struct-of-arrays layout, where each field lives in its own contiguous array, fits 64 copies of a single field per cache line. For sequential access to a 1 KiB struct, this yields up to 30x speedup. For random access, struct size matters because larger structs double working set size, pushing data into slower cache levels. A 64-byte struct with 512 monsters fits L1d at ~3 nanoseconds; a 128-byte struct with the same count spills to L2 at ~11 nanoseconds. The CPU prefetcher cannot predict random jumps, so total collection size determines which cache level data lands in.

What HN community is saying

Thread splits between those valuing this knowledge for performance-critical code and skeptics noting that I/O and business-logic problems dominate real-world Java deployments. Language design discussion emerged: Odin and Julia have struct-of-arrays helpers; some suggest languages should automatically convert array-of-structs syntax to struct-of-arrays storage. A detailed JVM defender noted Project Valhalla will improve object headers, arguing moving collectors are more efficient than malloc/free for large programs, and cited real sensor-fusion and air-traffic-control systems that migrated from C++ to Java for performance reasons.