http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47556

--- Comment #2 from Jeremy Fitzhardinge <jeremy at goop dot org> 2011-01-31 
22:16:54 UTC ---
Hm, yes, I see.  The hand-written asm, which uses %ah, does appear to run into
false partial register stalls according to 3.5.2.3 in the Intel Optimisation
Reference Manual.

On the other hand, the code generated by the C version appears to be slightly
slower in measurement on a Nehalem system.  Since the code in question is all
in the slow path (its the spin loop for a spinlock), perhaps the increased
icache pressure from the increased code size is more significant than the
register stalls.

Compiling with -Os rather than -O2 makes no difference.

Reply via email to