Hi, 

I was going through the "monitor" and "mwait" builtin implementation.
I need clarification on the parameters passed to _mm_mwait intrinsic.

We have the following defined in "pmmintrin.h"

extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
_mm_monitor (void const * __P, unsigned int __E, unsigned int __H)
{
  __builtin_ia32_monitor (__P, __E, __H);
}

extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
_mm_mwait (unsigned int __E, unsigned int __H)
{
  __builtin_ia32_mwait (__E, __H);
}

I assume parameter  names indicates  
P -> Address 
E -> Extensions
H -> Hints

Mwait as per AMD ISA manual 
Ref: 
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2008/10/24594_APM_v3.pdf
(---Snip---)
EAX specifies optional hints for the MWAIT instruction. There are currently no 
hints defined and all
bits should be 0. Setting a reserved bit in EAX is ignored by the processor.
ECX specifies optional extensions for the MWAIT instruction. The only extension 
currently defined is
ECX bit 0, which allows interrupts to wake MWAIT, even when eFLAGS.IF = 0. 
Support for this
extension is indicated by a feature flage returned by the CPUID instruction. 
Setting any unsupported
bit in ECX results in a #GP exception. 
(---Snip---)

Mwait defined as per intel ISA manual. 
Ref: 
http://www.intel.in/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf
(---Snip---)
This instruction's operation is the same in non-64-bit modes and 64-bit mode.
ECX specifies optional extensions for the MWAIT instruction. EAX may contain 
hints such as the preferred optimized
state the processor should enter. The first processors to implement MWAIT 
supported only the zero value for
EAX and ECX. Later processors allowed setting ECX[0] to enable masked 
interrupts as break events for MWAIT
(see below). Software can use the CPUID instruction to determine the extensions 
and hints supported by the
processor
(---Snip---)


So for if a user calls  _mm_mwait (__E, __H)  __E should go into ECX and __H 
should go into EAX.

However I see implementation in GCC

(---snip---)
  case IX86_BUILTIN_MWAIT:
      arg0 = CALL_EXPR_ARG (exp, 0);
      arg1 = CALL_EXPR_ARG (exp, 1);
      op0 = expand_normal (arg0);
      op1 = expand_normal (arg1);
      if (!REG_P (op0))
        op0 = copy_to_mode_reg (SImode, op0);
      if (!REG_P (op1))
        op1 = copy_to_mode_reg (SImode, op1);
      emit_insn (gen_sse3_mwait (op0, op1));
      return 0;


(define_insn "sse3_mwait"
  [(unspec_volatile [(match_operand:SI 0 "register_operand" "a")
                     (match_operand:SI 1 "register_operand" "c")]
                    UNSPECV_MWAIT)]
  "TARGET_SSE3"
;; 64bit version is "mwait %rax,%rcx". But only lower 32bits are used.
;; Since 32bit register operands are implicitly zero extended to 64bit,
;; we only need to set up 32bit registers.
  "mwait"
  [(set_attr "length" "3")])
(---snip---)

Here first argument __E is moved to "EAX"  and __H is moved to "ECX"
. 
Should the constraint be swaped for the operands in the pattern?
Or My understanding is wrong?

Regards,
Venkat.

Reply via email to