On Mon, Feb 18, 2019 at 6:37 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Mon, Feb 18, 2019 at 3:22 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > > > > > > > > On x86-64, since __m64 is returned and passed in XMM > > > > > > > > > > > > registers, we can > > > > > > > > > > > > emulate MMX intrinsics with SSE instructions. To > > > > > > > > > > > > support it, we added > > > > > > > > > > > > > > > > > > > > > > > > #define TARGET_MMX_WITH_SSE (TARGET_64BIT && > > > > > > > > > > > > TARGET_SSE2) > > > > > > > > > > > > > > > > > > > > > > > > ;; Define instruction set of MMX instructions > > > > > > > > > > > > (define_attr "mmx_isa" > > > > > > > > > > > > "base,native,x64,x64_noavx,x64_avx" > > > > > > > > > > > > (const_string "base")) > > > > > > > > > > > > > > > > > > > > > > > > (eq_attr "mmx_isa" "native") > > > > > > > > > > > > (symbol_ref "!TARGET_MMX_WITH_SSE") > > > > > > > > > > > > (eq_attr "mmx_isa" "x64") > > > > > > > > > > > > (symbol_ref "TARGET_MMX_WITH_SSE") > > > > > > > > > > > > (eq_attr "mmx_isa" "x64_avx") > > > > > > > > > > > > (symbol_ref "TARGET_MMX_WITH_SSE && > > > > > > > > > > > > TARGET_AVX") > > > > > > > > > > > > (eq_attr "mmx_isa" "x64_noavx") > > > > > > > > > > > > (symbol_ref "TARGET_MMX_WITH_SSE && > > > > > > > > > > > > !TARGET_AVX") > > > > > > > > > > > > > > > > > > > > > > > > We added SSE emulation to MMX patterns and disabled MMX > > > > > > > > > > > > alternatives with > > > > > > > > > > > > TARGET_MMX_WITH_SSE. > > > > > > > > > > > > > > > > > > > > > > > > Most of MMX instructions have equivalent SSE versions > > > > > > > > > > > > and results of some > > > > > > > > > > > > SSE versions need to be reshuffled to the right order > > > > > > > > > > > > for MMX. Thee are > > > > > > > > > > > > couple tricky cases: > > > > > > > > > > > > > > > > > > > > > > > > 1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent. > > > > > > > > > > > > We emulate MMX > > > > > > > > > > > > maskmovq with SSE2 maskmovdqu by zeroing out the upper > > > > > > > > > > > > 64 bits of the > > > > > > > > > > > > mask operand and handle unmapped bits 64:127 at memory > > > > > > > > > > > > address by > > > > > > > > > > > > adjusting source and mask operands together with memory > > > > > > > > > > > > address. > > > > > > > > > > > > > > > > > > > > > > > > 2. MMX movntq is emulated with SSE2 DImode movnti, > > > > > > > > > > > > which is available > > > > > > > > > > > > in 64-bit mode. > > > > > > > > > > > > > > > > > > > > > > > > 3. MMX pshufb takes a 3-bit index while SSE pshufb > > > > > > > > > > > > takes a 4-bit index. > > > > > > > > > > > > SSE emulation must clear the bit 4 in the shuffle > > > > > > > > > > > > control mask. > > > > > > > > > > > > > > > > > > > > > > > > 4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must > > > > > > > > > > > > properly preserve > > > > > > > > > > > > the upper 64 bits of destination XMM register. > > > > > > > > > > > > > > > > > > > > > > > > Tests are also added to check each SSE emulation of MMX > > > > > > > > > > > > intrinsics. > > > > > > > > > > > > > > > > > > > > > > > > There are no regressions on i686 and x86-64. For > > > > > > > > > > > > x86-64, GCC is also > > > > > > > > > > > > tested with > > > > > > > > > > > > > > > > > > > > > > > > --with-arch=native --with-cpu=native > > > > > > > > > > > > > > > > > > > > > > > > on AVX2 and AVX512F machines. > > > > > > > > > > > > > > > > > > > > > > An idea that would take patch a step further also on 32 > > > > > > > > > > > bit targets: > > > > > > > > > > > > > > > > > > > > > > *Assuming* that operations on XMM registers are as fast > > > > > > > > > > > (or perhaps > > > > > > > > > > > faster) than operations on MMX registers, we can change > > > > > > > > > > > mmx_isa > > > > > > > > > > > attribute in e.g. > > > > > > > > > > > > > > > > > > > > > > + "@ > > > > > > > > > > > + p<logic>\t{%2, %0|%0, %2} > > > > > > > > > > > + p<logic>\t{%2, %0|%0, %2} > > > > > > > > > > > + vp<logic>\t{%2, %1, %0|%0, %1, %2}" > > > > > > > > > > > + [(set_attr "mmx_isa" "native,x64_noavx,x64_avx") > > > > > > > > > > > > > > > > > > > > > > to: > > > > > > > > > > > > > > > > > > > > > > [(set_attr "isa" "*,noavx,avx") > > > > > > > > > > > (set_attr "mmx_isa" "native,*,*")] > > > > > > > > > > > > > > > > > > > > > > So, for x86_64 everything stays the same, but for x86_32 > > > > > > > > > > > we now allow > > > > > > > > > > > intrinsics to use xmm registers in addition to mmx > > > > > > > > > > > registers. We can't > > > > > > > > > > > disable MMX for x64_32 anyway due to ISA constraints (and > > > > > > > > > > > some tricky > > > > > > > > > > > cases, e.g. monvti that works only for 64bit targets and > > > > > > > > > > > e.g. maskmovq > > > > > > > > > > > & similar, which are more efficient with MMX regs), but > > > > > > > > > > > RA has much > > > > > > > > > > > more freedom to allocate the most effective register set > > > > > > > > > > > even for > > > > > > > > > > > 32bit targets. > > > > > > > > > > > > > > > > > > > > > > WDYT? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Since MMX registers are used to pass and return __m64 > > > > > > > > > > values, > > > > > > > > > > we can't really get rid of MMX instructions in 32-bit mode. > > > > > > > > > > If people > > > > > > > > > > have to stay with 32-bit mode, they need MMX. I don't > > > > > > > > > > think we should > > > > > > > > > > extend TARGET_MMX_WITH_SSE to 32-bit mode. > > > > > > > > > > > > > > > > > > No, TARGET_MMX_WITH_SSE is still enabled only for 64bit > > > > > > > > > targets. We > > > > > > > > > should not *disable* SSE alternatives on 32bit targets. > > > > > > > > > > > > > > I don't think my patch set disables any SSE alternatives in 32-bit > > > > > > > mode. However, > > > > > > > it DOES NOT enable any SSE alternatives in 32-bit mode. To > > > > > > > really enable SSE > > > > > > > alternatives in > > > > > > > > > > > > > > (define_insn "*mmx_<code><mode>3" > > > > > > > [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv") > > > > > > > (any_logic:MMXMODEI > > > > > > > (match_operand:MMXMODEI 1 "register_mmxmem_operand" > > > > > > > "%0,0,Yv") > > > > > > > (match_operand:MMXMODEI 2 "register_mmxmem_operand" > > > > > > > "ym,x,Yv")))] > > > > > > > "(TARGET_MMX || TARGET_MMX_WITH_SSE) > > > > > > > && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)" > > > > > > > "@ > > > > > > > p<logic>\t{%2, %0|%0, %2} > > > > > > > p<logic>\t{%2, %0|%0, %2} > > > > > > > vp<logic>\t{%2, %1, %0|%0, %1, %2}" > > > > > > > [(set_attr "mmx_isa" "native,x64_noavx,x64_avx") > > > > > > > (set_attr "type" "mmxadd,sselog,sselog") > > > > > > > (set_attr "mode" "DI,TI,TI")]) > > > > > > > > > > > > > > register_mmxmem_operand must return true for SSE alternatives: > > > > > > > > > > > > It returns true for register and memory operands for 32bit targets, > > > > > > because > > > > > > > > > > > > #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2) > > > > > > > > > > Will > > > > > > > > > > (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv"))))] > > > > > > > > > > work well with RA? I got some wrong code before > > > > > register_mmxmem_operand > > > > > was added to match "ym,x,Yv". > > > > > > > > I see no reason why it shouldn't. > > > > > > This will be equivalent to replace register_operand in > > > > > > [(match_operand:VI1_AVX512VLBW 1 "register_operand" "v") > > > > > > with nonimmediate_operand. If it should work, I can do it in i386.md and > > > sse.md to check it out. > > > > > > > I tried: > > > > sed -i -e "s/\"register_operand\"[ > > \t]\+\(\"[^=^\+^f]\+\"[^=]\+$\)/\"nonimmediate_operand\" \1/" i386.md > > I don't know what is the point in changing these operands, but
The point is we can't replace register_operand with nonimmediate_operand in all places. > (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv") > > should work without problems. > 32-bit MMX has very low priority. I will try it in the second phase. -- H.J.