https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |uros at gcc dot gnu.org

--- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
For V2DImode arithmetic right shift, I think it would be something like:
--- gcc/config/i386/sse.md.jj   2021-01-27 11:50:09.168981297 +0100
+++ gcc/config/i386/sse.md      2021-02-05 14:32:44.175463716 +0100
@@ -20313,10 +20313,55 @@ (define_expand "ashrv2di3"
        (ashiftrt:V2DI
          (match_operand:V2DI 1 "register_operand")
          (match_operand:DI 2 "nonmemory_operand")))]
-  "TARGET_XOP || TARGET_AVX512VL"
+  "TARGET_SSE4_2"
 {
   if (!TARGET_AVX512VL)
     {
+      if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) == 63)
+       {
+         rtx zero = force_reg (V2DImode, CONST0_RTX (V2DImode));
+         emit_insn (gen_sse4_2_gtv2di3 (operands[0], zero, operands[1]));
+         DONE;
+       }
+      if (operands[2] == const0_rtx)
+       {
+         emit_move_insn (operands[0], operands[1]);
+         DONE;
+       }
+      if (!TARGET_XOP)
+       {
+         rtx zero = force_reg (V2DImode, CONST0_RTX (V2DImode));
+         rtx zero_or_all_ones = gen_reg_rtx (V2DImode);
+         emit_insn (gen_sse4_2_gtv2di3 (zero_or_all_ones, zero, operands[1]));
+         rtx lshr_res = gen_reg_rtx (V2DImode);
+         emit_insn (gen_lshrv2di3 (lshr_res, operands[1], operands[2]));
+         rtx ashl_res = gen_reg_rtx (V2DImode);
+         rtx amount;
+         if (CONST_INT_P (operands[2]))
+           amount = GEN_INT (64 - INTVAL (operands[2]));
+         else if (TARGET_64BIT)
+           {
+             amount = gen_reg_rtx (DImode);
+             emit_insn (gen_subdi3 (amount, force_reg (DImode, GEN_INT (64)),
+                                    operands[2]));
+           }
+         else
+           {
+             rtx temp = gen_reg_rtx (SImode);
+             emit_insn (gen_subsi3 (temp, force_reg (SImode, GEN_INT (64)),
+                                    lowpart_subreg (SImode, operands[2],
+                                                    DImode)));
+             amount = gen_reg_rtx (V4SImode);
+             emit_insn (gen_vec_setv4si_0 (amount, CONST0_RTX (V4SImode),
+                                           temp));
+           }
+         if (!CONST_INT_P (operands[2]))
+           amount = lowpart_subreg (DImode, amount, GET_MODE (amount));
+         emit_insn (gen_ashlv2di3 (ashl_res, zero_or_all_ones, amount));
+         emit_insn (gen_iorv2di3 (operands[0], lshr_res, ashl_res));
+         DONE;
+       }
+
       rtx reg = gen_reg_rtx (V2DImode);
       rtx par;
       bool negate = false;
plus adjusting the cost computation to hint that at least the non-63 arithmetic
right V2DImode shifts are more expensive.

Even if in the end the V2DImode arithmetic right shifts turn to be more
expensive than scalar code (though, it surprises me at least for the >> 63
case),
I think V4DImode for TARGET_AVX2 should be beneficial always (haven't tried to
adjust the expander for that yet).

Reply via email to