On Mon, Feb 11, 2019 at 7:09 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > On Sun, Feb 10, 2019 at 3:16 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > > > On 2/10/19, H.J. Lu <hjl.to...@gmail.com> wrote: > > > Emulate MMX pshufw with SSE. Only SSE register source operand is allowed. > > > > > > PR target/89021 > > > * config/i386/mmx.md (mmx_pshufw_1): Add SSE emulation. > > > (*vec_dupv4hi): Likewise. > > > emulation. > > > --- > > > gcc/config/i386/mmx.md | 33 +++++++++++++++++++++------------ > > > 1 file changed, 21 insertions(+), 12 deletions(-) > > > > > > diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md > > > index 1ee51c5deb7..dc81d7f45df 100644 > > > --- a/gcc/config/i386/mmx.md > > > +++ b/gcc/config/i386/mmx.md > > > @@ -1364,7 +1364,8 @@ > > > [(match_operand:V4HI 0 "register_operand") > > > (match_operand:V4HI 1 "nonimmediate_operand") > > > (match_operand:SI 2 "const_int_operand")] > > > - "TARGET_SSE || TARGET_3DNOW_A" > > > + "((TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE) > > > + || TARGET_3DNOW_A" > > > > I think that the above condition should read > > > > (TARGET_MMX || TARGET_MMX_WITH_SSE) && (TARGET_SSE || TARGET_3DNOW_A) > > > > and with TARGET_MMX_WITH_SSE (which implies SSE2) we always use XMM > > registers. Without SSE2, we use MMX registers, as before. > > Done. > > > > { > > > int mask = INTVAL (operands[2]); > > > emit_insn (gen_mmx_pshufw_1 (operands[0], operands[1], > > > @@ -1376,14 +1377,15 @@ > > > }) > > > > > > (define_insn "mmx_pshufw_1" > > > - [(set (match_operand:V4HI 0 "register_operand" "=y") > > > + [(set (match_operand:V4HI 0 "register_operand" "=y,Yv") > > > (vec_select:V4HI > > > - (match_operand:V4HI 1 "nonimmediate_operand" "ym") > > > + (match_operand:V4HI 1 "nonimmediate_operand" "ym,Yv") > > > (parallel [(match_operand 2 "const_0_to_3_operand") > > > (match_operand 3 "const_0_to_3_operand") > > > (match_operand 4 "const_0_to_3_operand") > > > (match_operand 5 "const_0_to_3_operand")])))] > > > - "TARGET_SSE || TARGET_3DNOW_A" > > > + "((TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE) > > > + || TARGET_3DNOW_A" > > > { > > > int mask = 0; > > > mask |= INTVAL (operands[2]) << 0; > > > @@ -1392,11 +1394,15 @@ > > > mask |= INTVAL (operands[5]) << 6; > > > operands[2] = GEN_INT (mask); > > > > > > - return "pshufw\t{%2, %1, %0|%0, %1, %2}"; > > > + if (TARGET_MMX_WITH_SSE) > > > + return "%vpshuflw\t{%2, %1, %0|%0, %1, %2}"; > > > + else > > > + return "pshufw\t{%2, %1, %0|%0, %1, %2}"; > > > > The above should be implemented as multi-output template. > > I have > > { > int mask = 0; > mask |= INTVAL (operands[2]) << 0; > mask |= INTVAL (operands[3]) << 2; > mask |= INTVAL (operands[4]) << 4; > mask |= INTVAL (operands[5]) << 6; > operands[2] = GEN_INT (mask); > > if (TARGET_MMX_WITH_SSE) > return "%vpshuflw\t{%2, %1, %0|%0, %1, %2}"; > else > return "pshufw\t{%2, %1, %0|%0, %1, %2}"; > } > > How can I build mask before multi-output template?
You are right, mask has to be adjusted before output. Maybe we should be more explicit here with: switch (which_alternative) { case 0: return "pshufw\t{%2, %1, %0|%0, %1, %2}"; case 1: return "pshufw\t{%2, %1, %0|%0, %1, %2}"; default: gcc_unreachable (); } Uros.