AMD releases "Barcelona" software optimization guide

AMD just released the software optimization guide for their K10 ("Barcelona") CPU line. You can get it from here. Seems the SSE units are really 128 bit wide (as expected). They improved the instruction fetching to fill the wider execution units (this was already a bottleneck on the K8 architecture, nice to see it fixed). The decoders seem to be better and some instructions are faster now. A "sideband stack optimizer" was added to improve PUSH/POP instructions. Support for large pages has been improved with 1 GiB large pages. All in all it looks like an evolutionary overhaul, I'm curious to see it benchmarked.