Hi, I was going through the "monitor" and "mwait" builtin implementation. I need clarification on the parameters passed to _mm_mwait intrinsic.
We have the following defined in "pmmintrin.h" extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_monitor (void const * __P, unsigned int __E, unsigned int __H) { __builtin_ia32_monitor (__P, __E, __H); } extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_mwait (unsigned int __E, unsigned int __H) { __builtin_ia32_mwait (__E, __H); } I assume parameter names indicates P -> Address E -> Extensions H -> Hints Mwait as per AMD ISA manual Ref: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2008/10/24594_APM_v3.pdf (---Snip---) EAX specifies optional hints for the MWAIT instruction. There are currently no hints defined and all bits should be 0. Setting a reserved bit in EAX is ignored by the processor. ECX specifies optional extensions for the MWAIT instruction. The only extension currently defined is ECX bit 0, which allows interrupts to wake MWAIT, even when eFLAGS.IF = 0. Support for this extension is indicated by a feature flage returned by the CPUID instruction. Setting any unsupported bit in ECX results in a #GP exception. (---Snip---) Mwait defined as per intel ISA manual. Ref: http://www.intel.in/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf (---Snip---) This instruction's operation is the same in non-64-bit modes and 64-bit mode. ECX specifies optional extensions for the MWAIT instruction. EAX may contain hints such as the preferred optimized state the processor should enter. The first processors to implement MWAIT supported only the zero value for EAX and ECX. Later processors allowed setting ECX[0] to enable masked interrupts as break events for MWAIT (see below). Software can use the CPUID instruction to determine the extensions and hints supported by the processor (---Snip---) So for if a user calls _mm_mwait (__E, __H) __E should go into ECX and __H should go into EAX. However I see implementation in GCC (---snip---) case IX86_BUILTIN_MWAIT: arg0 = CALL_EXPR_ARG (exp, 0); arg1 = CALL_EXPR_ARG (exp, 1); op0 = expand_normal (arg0); op1 = expand_normal (arg1); if (!REG_P (op0)) op0 = copy_to_mode_reg (SImode, op0); if (!REG_P (op1)) op1 = copy_to_mode_reg (SImode, op1); emit_insn (gen_sse3_mwait (op0, op1)); return 0; (define_insn "sse3_mwait" [(unspec_volatile [(match_operand:SI 0 "register_operand" "a") (match_operand:SI 1 "register_operand" "c")] UNSPECV_MWAIT)] "TARGET_SSE3" ;; 64bit version is "mwait %rax,%rcx". But only lower 32bits are used. ;; Since 32bit register operands are implicitly zero extended to 64bit, ;; we only need to set up 32bit registers. "mwait" [(set_attr "length" "3")]) (---snip---) Here first argument __E is moved to "EAX" and __H is moved to "ECX" . Should the constraint be swaped for the operands in the pattern? Or My understanding is wrong? Regards, Venkat.