Re: [PATCH 00/41] V8: Emulate MMX intrinsics with SSE

H.J. Lu Mon, 18 Feb 2019 06:23:14 -0800

On Sun, Feb 17, 2019 at 12:46 PM H.J. Lu <[email protected]> wrote:
>
> On Sun, Feb 17, 2019 at 10:49 AM Uros Bizjak <[email protected]> wrote:
> >
> > On Sun, Feb 17, 2019 at 6:37 PM H.J. Lu <[email protected]> wrote:
> >
> > > > > > > > > > On x86-64, since __m64 is returned and passed in XMM 
> > > > > > > > > > registers, we can
> > > > > > > > > > emulate MMX intrinsics with SSE instructions. To support 
> > > > > > > > > > it, we added
> > > > > > > > > >
> > > > > > > > > >  #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)
> > > > > > > > > >
> > > > > > > > > > ;; Define instruction set of MMX instructions
> > > > > > > > > > (define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx"
> > > > > > > > > >   (const_string "base"))
> > > > > > > > > >
> > > > > > > > > >          (eq_attr "mmx_isa" "native")
> > > > > > > > > >            (symbol_ref "!TARGET_MMX_WITH_SSE")
> > > > > > > > > >          (eq_attr "mmx_isa" "x64")
> > > > > > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE")
> > > > > > > > > >          (eq_attr "mmx_isa" "x64_avx")
> > > > > > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")
> > > > > > > > > >          (eq_attr "mmx_isa" "x64_noavx")
> > > > > > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")
> > > > > > > > > >
> > > > > > > > > > We added SSE emulation to MMX patterns and disabled MMX 
> > > > > > > > > > alternatives with
> > > > > > > > > > TARGET_MMX_WITH_SSE.
> > > > > > > > > >
> > > > > > > > > > Most of MMX instructions have equivalent SSE versions and 
> > > > > > > > > > results of some
> > > > > > > > > > SSE versions need to be reshuffled to the right order for 
> > > > > > > > > > MMX.  Thee are
> > > > > > > > > > couple tricky cases:
> > > > > > > > > >
> > > > > > > > > > 1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent.  We 
> > > > > > > > > > emulate MMX
> > > > > > > > > > maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 
> > > > > > > > > > bits of the
> > > > > > > > > > mask operand and handle unmapped bits 64:127 at memory 
> > > > > > > > > > address by
> > > > > > > > > > adjusting source and mask operands together with memory 
> > > > > > > > > > address.
> > > > > > > > > >
> > > > > > > > > > 2. MMX movntq is emulated with SSE2 DImode movnti, which is 
> > > > > > > > > > available
> > > > > > > > > > in 64-bit mode.
> > > > > > > > > >
> > > > > > > > > > 3. MMX pshufb takes a 3-bit index while SSE pshufb takes a 
> > > > > > > > > > 4-bit index.
> > > > > > > > > > SSE emulation must clear the bit 4 in the shuffle control 
> > > > > > > > > > mask.
> > > > > > > > > >
> > > > > > > > > > 4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must 
> > > > > > > > > > properly preserve
> > > > > > > > > > the upper 64 bits of destination XMM register.
> > > > > > > > > >
> > > > > > > > > > Tests are also added to check each SSE emulation of MMX 
> > > > > > > > > > intrinsics.
> > > > > > > > > >
> > > > > > > > > > There are no regressions on i686 and x86-64.  For x86-64, 
> > > > > > > > > > GCC is also
> > > > > > > > > > tested with
> > > > > > > > > >
> > > > > > > > > > --with-arch=native --with-cpu=native
> > > > > > > > > >
> > > > > > > > > > on AVX2 and AVX512F machines.
> > > > > > > > >
> > > > > > > > > An idea that would take patch a step further also on 32 bit 
> > > > > > > > > targets:
> > > > > > > > >
> > > > > > > > > *Assuming* that operations on XMM registers are as fast (or 
> > > > > > > > > perhaps
> > > > > > > > > faster) than operations on MMX registers, we can change 
> > > > > > > > > mmx_isa
> > > > > > > > > attribute in e.g.
> > > > > > > > >
> > > > > > > > > +  "@
> > > > > > > > > +   p<logic>\t{%2, %0|%0, %2}
> > > > > > > > > +   p<logic>\t{%2, %0|%0, %2}
> > > > > > > > > +   vp<logic>\t{%2, %1, %0|%0, %1, %2}"
> > > > > > > > > +  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
> > > > > > > > >
> > > > > > > > > to:
> > > > > > > > >
> > > > > > > > > [(set_attr "isa" "*,noavx,avx")
> > > > > > > > >  (set_attr "mmx_isa" "native,*,*")]
> > > > > > > > >
> > > > > > > > > So, for x86_64 everything stays the same, but for x86_32 we 
> > > > > > > > > now allow
> > > > > > > > > intrinsics to use xmm registers in addition to mmx registers. 
> > > > > > > > > We can't
> > > > > > > > > disable MMX for x64_32 anyway due to ISA constraints (and 
> > > > > > > > > some tricky
> > > > > > > > > cases, e.g. monvti that works only for 64bit targets and e.g. 
> > > > > > > > > maskmovq
> > > > > > > > > & similar, which are more efficient with MMX regs), but RA 
> > > > > > > > > has much
> > > > > > > > > more freedom to allocate the most effective register set even 
> > > > > > > > > for
> > > > > > > > > 32bit targets.
> > > > > > > > >
> > > > > > > > > WDYT?
> > > > > > > > >
> > > > > > > >
> > > > > > > > Since MMX registers are used to pass and return __m64 values,
> > > > > > > > we can't really get rid of MMX instructions in 32-bit mode.  If 
> > > > > > > > people
> > > > > > > > have to stay with 32-bit mode, they need MMX.  I don't think we 
> > > > > > > > should
> > > > > > > > extend TARGET_MMX_WITH_SSE to 32-bit mode.
> > > > > > >
> > > > > > > No, TARGET_MMX_WITH_SSE is still enabled only for 64bit targets. 
> > > > > > > We
> > > > > > > should not *disable* SSE alternatives on 32bit targets.
> > > > >
> > > > > I don't think my patch set disables any SSE alternatives in 32-bit
> > > > > mode.   However,
> > > > > it DOES NOT enable any SSE alternatives in 32-bit mode.  To really 
> > > > > enable SSE
> > > > > alternatives in
> > > > >
> > > > > (define_insn "*mmx_<code><mode>3"
> > > > >   [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")
> > > > >         (any_logic:MMXMODEI
> > > > >           (match_operand:MMXMODEI 1 "register_mmxmem_operand" 
> > > > > "%0,0,Yv")
> > > > >           (match_operand:MMXMODEI 2 "register_mmxmem_operand" 
> > > > > "ym,x,Yv")))]
> > > > >   "(TARGET_MMX || TARGET_MMX_WITH_SSE)
> > > > >    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
> > > > >   "@
> > > > >    p<logic>\t{%2, %0|%0, %2}
> > > > >    p<logic>\t{%2, %0|%0, %2}
> > > > >    vp<logic>\t{%2, %1, %0|%0, %1, %2}"
> > > > >   [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
> > > > >    (set_attr "type" "mmxadd,sselog,sselog")
> > > > >    (set_attr "mode" "DI,TI,TI")])
> > > > >
> > > > > register_mmxmem_operand must return true for SSE alternatives:
> > > >
> > > > It returns true for register and memory operands for 32bit targets, 
> > > > because
> > > >
> > > > #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)
> > >
> > > Will
> > >
> > > (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv"))))]
> > >
> > > work well with RA?  I got some wrong code before register_mmxmem_operand
> > > was added to match "ym,x,Yv".
> >
> > I see no reason why it shouldn't.
>
> This will be equivalent to replace register_operand in
>
> [(match_operand:VI1_AVX512VLBW 1 "register_operand" "v")
>
> with nonimmediate_operand.  If it should work, I can do it in i386.md and
> sse.md to check it out.
>


I tried:

sed -i -e "s/\"register_operand\"[
\t]\+\(\"[^=^\+^f]\+\"[^=]\+$\)/\"nonimmediate_operand\" \1/" i386.md

and got

(gdb) call debug_rtx (insn)
(insn 65 19 67 2 (parallel [
            (set (reg/f:SI 97)
                (plus:SI (mem/u/c:SI (plus:SI (reg:SI 82)
                            (const:SI (unspec:SI [
                                        (symbol_ref:SI
("gomp_tls_data") [flags 0x62] <var_decl 0x7fffea6c5e10
gomp_tls_data>)
                                    ] UNSPEC_GOTNTPOFF))) [17  S4 A8])
                    (mem/u/c:SI (const_int 0 [0]) [0  S4 A8 AS2])))
            (clobber (reg:CC 17 flags))
        ]) 
"/export/gnu/import/git/gitlab/x86-gcc/libgomp/config/linux/lock.c":139:7
-1
     (expr_list:REG_DEAD (reg:SI 82)
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (expr_list:REG_EQUIV (symbol_ref:SI ("gomp_tls_data")
[flags 0x62] <var_decl 0x7fffea6c5e10 gomp_tls_data>)
                (nil)))))
(gdb) c
Continuing.
during RTL pass: ira
/export/gnu/import/git/gitlab/x86-gcc/libgomp/config/linux/lock.c: In
function ‘gomp_test_nest_lock_25’:
/export/gnu/import/git/gitlab/x86-gcc/libgomp/config/linux/lock.c:149:1:
internal compiler error: in elimination_costs_in_insn, at
reload1.c:3640
  149 | }
      | ^
0x108b258 elimination_costs_in_insn
/export/gnu/import/git/gitlab/x86-gcc/gcc/reload1.c:3637
0x108596f calculate_elim_costs_all_insns()
/export/gnu/import/git/gitlab/x86-gcc/gcc/reload1.c:1609
0xe61a7a ira_costs()
/export/gnu/import/git/gitlab/x86-gcc/gcc/ira-costs.c:2298
0xe56613 ira_build()
/export/gnu/import/git/gitlab/x86-gcc/gcc/ira-build.c:3432
0xe4b31d ira
/export/gnu/import/git/gitlab/x86-gcc/gcc/ira.c:5346
0xe4bba0 execute
/export/gnu/import/git/gitlab/x86-gcc/gcc/ira.c:5657
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.


-- 
H.J.

Re: [PATCH 00/41] V8: Emulate MMX intrinsics with SSE

Reply via email to