> -----Original Message-----
> From: Uros Bizjak [mailto:ubiz...@gmail.com]
> Sent: Thursday, May 10, 2018 3:26 PM
> To: Peryt, Sebastian <sebastian.pe...@intel.com>
> Cc: gcc-patches@gcc.gnu.org; Kirill Yukhin <kirill.yuk...@gmail.com>
> Subject: Re: [PATCH][i386] Adding WAITPKG instructions
> 
> On Thu, May 10, 2018 at 2:50 PM, Peryt, Sebastian <sebastian.pe...@intel.com>
> wrote:
> > Hi Uros,
> >
> > Updated patch attached, please find comments below.
> >
> >> -----Original Message-----
> >> From: Uros Bizjak [mailto:ubiz...@gmail.com]
> >> Sent: Wednesday, May 9, 2018 1:47 PM
> >> To: Peryt, Sebastian <sebastian.pe...@intel.com>
> >> Cc: gcc-patches@gcc.gnu.org; Kirill Yukhin <kirill.yuk...@gmail.com>
> >> Subject: Re: [PATCH][i386] Adding WAITPKG instructions
> >>
> >> On Tue, May 8, 2018 at 1:34 PM, Peryt, Sebastian
> >> <sebastian.pe...@intel.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > This patch adds support for WAITPKG instructions.
> >> >
> >> > Is it ok for trunk and after few day for backport to GCC-8?
> >> >
> > (Removed)
> >> >
> >> >
> >>
> >> +    case IX86_BUILTIN_UMONITOR:
> >> +      arg0 = CALL_EXPR_ARG (exp, 0);
> >> +      op0 = expand_normal (arg0);
> >> +      if (!REG_P (op0))
> >> +    op0 = ix86_zero_extend_to_Pmode (op0);
> >> +
> >> +      emit_insn (ix86_gen_umonitor (op0));
> >> +      return 0;
> >>
> >> Please see how movdir64b handles its address operand. Also, do not
> >> use global ix86_gen_monitor, just expand directly in the same way as
> movdir64b.
> >>
> >
> > Fixed.
> >
> >> +    case IX86_BUILTIN_UMWAIT:
> >> +    case IX86_BUILTIN_TPAUSE:
> >> +      rtx eax, edx, op1_lo, op1_hi;
> >> +      arg0 = CALL_EXPR_ARG (exp, 0);
> >> +      arg1 = CALL_EXPR_ARG (exp, 1);
> >> +      op0 = expand_normal (arg0);
> >> +      op1 = expand_normal (arg1);
> >> +      eax = gen_rtx_REG (SImode, AX_REG);
> >> +      edx = gen_rtx_REG (SImode, DX_REG);
> >> +      if (!REG_P (op0))
> >> +    op0 = copy_to_mode_reg (SImode, op0);
> >> +      if (!REG_P (op1))
> >> +    op1 = copy_to_mode_reg (DImode, op1);
> >> +      op1_lo = gen_lowpart (SImode, op1);
> >> +      op1_hi = expand_shift (RSHIFT_EXPR, DImode, op1,
> >> +             GET_MODE_BITSIZE (SImode), 0, 1);
> >> +      op1_hi = convert_modes (SImode, DImode, op1_hi, 1);
> >> +      emit_move_insn (eax, op1_lo);
> >> +      emit_move_insn (edx, op1_hi);
> >> +      emit_insn (fcode == IX86_BUILTIN_UMWAIT
> >> +        ? gen_umwait (op0, eax, edx)
> >> +        : gen_tpause (op0, eax, edx));
> >> +
> >> +      /* Return current CF value.  */
> >> +      op3 = gen_rtx_REG (CCCmode, FLAGS_REG);
> >> +      target = gen_rtx_LTU (QImode, op3, const0_rtx);
> >> +
> >> +      return target;
> >>
> >> For the above code, please see how xsetbv expansion and patterns are
> >> handling their input operands. There should be two patterns, one for
> >> 32bit and the other for 64bit targets. The patterns will need to set
> >> FLAGS_REG, otherwise the test will be removed.
> >>
> >
> > I copied what is done for xsetbv expansion and most likely I found some bug 
> > in
> GCC.
> > The problem is that when I use 3 arguments and compile as 64bit
> > version upper part of rax is not cleared. It doesn't appear when I'm using 
> > 2 or 4
> function arguments.
> > Most likely error is caused by the fact that rdx is used both as an
> > input for function and argument in instruction.
> 
> There is no need to clear upper parts of 64bit register. As specified in the 
> ISA
> (and modelled with RTX pattern), the instruction (e.g.
> tpause) reads only lower 32 bits from %rax and %rdx. Implicitly, the 
> instruction
> should ignore upper 32 bits by itself, so we can use SUBREGs. If this is not 
> the
> case, we need to use DImode input arguments in RTX pattern and explicitly emit
> zero-extension insns to clear upper 32 bits of input arguments.
> 

Ok, I agree with you regarding clearing.

But there is still one thing bothering me as explained in last email. The 
problem appears when I use 3
arguments and compile as 64bit version. Assembly generated is different from 
when I'm adding extra unused
argument or removing one function argument not related to my instruction. I'm 
talking about umonitor-1.c test, function bar.

Do you see the difference? This is the problem with clearing of registers I 
wrote previously. Why is this happening?
Is it a bug?

When using 3 operands:
bar:
.LFB5450:
        .cfi_startproc
        movq    %rdx, %rax
        umonitor        %rdi
        movq    %rdx, %rcx
        shrq    $32, %rcx
        movq    %rcx, %rdx
        umwait  %esi
        setc    %al
        ret
        .cfi_endproc

When using 4 operands:
bar:
.LFB5450:
        .cfi_startproc
        movl    %edx, %esi
        umonitor        %rdi
        movq    %rcx, %rax
        shrq    $32, %rax
        movq    %rax, %rdx
        movl    %ecx, %eax
        umwait  %esi
        setc    %al
        ret
        .cfi_endproc


Overall I understand that patch is ok for trunk?

> Uros.

Thanks,
Sebastian

Reply via email to