On 05/25/2011 06:26 PM, Andrew Pinski wrote: > On Wed, May 25, 2011 at 10:19 AM, H.J. Lu <hjl.to...@gmail.com> wrote: >> -- >> H.J. >> --- >> Index: doc/extend.texi >> =================================================================== >> --- doc/extend.texi (revision 174216) >> +++ doc/extend.texi (working copy) >> @@ -8699,7 +8699,8 @@ The following built-in function is alway >> >> @table @code >> @item void __builtin_ia32_pause (void) >> -Generates the @code{pause} machine instruction with full memory barrier. >> +Generates the @code{pause} machine instruction with a compiler memory >> +barrier. > > What is the pause machine instruction do?
That's documented by Intel in the architecture manual. Surely we don't have to explain it all. Andrew. PAUSE—Spin Loop Hint Improves the performance of spin-wait loops. When executing a “spin-wait loop,” a Pentium 4 or Intel Xeon processor suffers a severe performance penalty when exiting the loop because it detects a possible memory order violation. The PAUSE instruction provides a hint to the processor that the code sequence is a spin-wait loop. The processor uses this hint to avoid the memory order violation in most situations, which greatly improves processor performance. For this reason, it is recommended that a PAUSE instruction be placed in all spin-wait loops. An additional function of the PAUSE instruction is to reduce the power consumed by a Pentium 4 processor while executing a spin loop. The Pentium 4 processor can execute a spin-wait loop extremely quickly, causing the processor to consume a lot of power while it waits for the resource it is spinning on to become available. Inserting a pause instruction in a spin-wait loop greatly reduces the processor’s power consumption. This instruction was introduced in the Pentium 4 processors, but is backward compat- ible with all IA-32 processors. In earlier IA-32 processors, the PAUSE instruction operates like a NOP instruction. The Pentium 4 and Intel Xeon processors implement the PAUSE instruction as a pre-defined delay. The delay is finite and can be zero for some processors. This instruction does not change the architectural state of the processor (that is, it performs essentially a delaying no-op operation). This instruction’s operation is the same in non-64-bit modes and 64-bit mode.