Re: [PATCH] Reintroduce vec_shl_optab and use it for #pragma omp scan inclusive

Richard Biener Wed, 19 Jun 2019 02:03:44 -0700

On June 19, 2019 10:55:16 AM GMT+02:00, Jakub Jelinek <ja...@redhat.com> wrote:
>Hi!
>
>When VEC_[LR]SHIFT_EXPR has been replaced with VEC_PERM_EXPR,
>vec_shl_optab
>has been removed as unused, because we only used vec_shr_optab for the
>reductions.
>Without this patch the vect-simd-*.c tests can be vectorized just fine
>for SSE4 and above, but can't be with SSE2.  As the comment in
>tree-vect-stmts.c tries to explain, for the inclusive scan operation we
>want (when using V8SImode vectors):
>       _30 = MEM <vector(8) int> [(int *)&D.2043];
>       _31 = MEM <vector(8) int> [(int *)&D.2042];
>       _32 = VEC_PERM_EXPR <_31, _40, { 8, 0, 1, 2, 3, 4, 5, 6 }>;
>       _33 = _31 + _32;
> // _33 = { _31[0], _31[0]+_31[1], _31[1]+_31[2], ..., _31[6]+_31[7] };
>       _34 = VEC_PERM_EXPR <_33, _40, { 8, 9, 0, 1, 2, 3, 4, 5 }>;
>       _35 = _33 + _34;
>    // _35 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], _31[0]+.._31[3],
>       //         _31[1]+.._31[4], ... _31[4]+.._31[7] };
>       _36 = VEC_PERM_EXPR <_35, _40, { 8, 9, 10, 11, 0, 1, 2, 3 }>;
>       _37 = _35 + _36;
>    // _37 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], _31[0]+.._31[3],
>       //         _31[0]+.._31[4], ... _31[0]+.._31[7] };
>       _38 = _30 + _37;
>       _39 = VEC_PERM_EXPR <_38, _38, { 7, 7, 7, 7, 7, 7, 7, 7 }>;
>       MEM <vector(8) int> [(int *)&D.2043] = _39;
>       MEM <vector(8) int> [(int *)&D.2042] = _38;  */
>For V4SImode vectors that would be VEC_PERM_EXPR <x, init, { 4, 0, 1, 2
>}>,
>VEC_PERM_EXPR <x2, init, { 4, 5, 0, 1 }> and
>VEC_PERM_EXPR <x3, init, { 3, 3, 3, 3 }> etc.
>Unfortunately, SSE2 can't do the VEC_PERM_EXPR <x, init, { 4, 0, 1, 2
>}>
>permutation (the other two it can do).  Well, to be precise, it can do
>it
>using the vector left shift which has been removed as unused, provided
>that init is initializer_zerop (shifting all zeros from the left).
>init usually is all zeros, that is the neutral element of additive
>reductions and couple of others too, in the unlikely case that some
>other
>reduction is used with scan (multiplication, minimum, maximum, bitwise
>and),
>we can use a VEC_COND_EXPR with constant first argument, i.e. a blend
>or
>and/or.
>
>So, this patch reintroduces vec_shl_optab (most backends actually have
>those
>patterns already) and handles its expansion and vector generic lowering
>similarly to vec_shr_optab - i.e. it is a VEC_PERM_EXPR where the first
>operand is initializer_zerop and third operand starts with a few
>numbers
>smaller than number of elements (doesn't matter which one, as all
>elements
>are same - zero) followed by nelts, nelts+1, nelts+2, ...
>Unlike vec_shr_optab which has zero as the second operand, this one has
>it
>as first operand, because VEC_PERM_EXPR canonicalization wants to have
>first element selector smaller than number of elements.  And unlike
>vec_shr_optab, where we also have a fallback in have_whole_vector_shift
>using normal permutations, this one doesn't need it, that "fallback" is
>tried
>first before vec_shl_optab.
>
>For the vec_shl_optab checks, it tests only for constant number of
>elements
>vectors, not really sure if our VECTOR_CST encoding can express the
>left
>shifts in any way nor whether SVE supports those (I see aarch64 has
>vec_shl_insert but that is just a fixed shift by element bits and
>shifts in
>a scalar rather than zeros).
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


Ok. 

Richard. 

>2019-06-19  Jakub Jelinek  <ja...@redhat.com>
>
>       * doc/md.texi: Document vec_shl_<mode> pattern.
>       * optabs.def (vec_shl_optab): New optab.
>       * optabs.c (shift_amt_for_vec_perm_mask): Add shift_optab
>       argument, if == vec_shl_optab, check for left whole vector shift
>       pattern rather than right shift.
>       (expand_vec_perm_const): Add vec_shl_optab support.
>       * optabs-query.c (can_vec_perm_var_p): Mention also vec_shl optab
>       in the comment.
>       * tree-vect-generic.c (lower_vec_perm): Support permutations which
>       can be handled by vec_shl_optab.
>       * tree-vect-stmts.c (scan_store_can_perm_p): New function.
>       (check_scan_store): Use it.
>       (vectorizable_scan_store): If target can't do normal permutations,
>       try to use whole vector left shifts and if needed a VEC_COND_EXPR
>       after it.
>       * config/i386/sse.md (vec_shl_<mode>): New expander.
>
>       * gcc.dg/vect/vect-simd-8.c: If main is defined, don't include
>       tree-vect.h nor call check_vect.
>       * gcc.dg/vect/vect-simd-9.c: Likewise.
>       * gcc.dg/vect/vect-simd-10.c: New test.
>       * gcc.target/i386/sse2-vect-simd-8.c: New test.
>       * gcc.target/i386/sse2-vect-simd-9.c: New test.
>       * gcc.target/i386/sse2-vect-simd-10.c: New test.
>       * gcc.target/i386/avx2-vect-simd-8.c: New test.
>       * gcc.target/i386/avx2-vect-simd-9.c: New test.
>       * gcc.target/i386/avx2-vect-simd-10.c: New test.
>       * gcc.target/i386/avx512f-vect-simd-8.c: New test.
>       * gcc.target/i386/avx512f-vect-simd-9.c: New test.
>       * gcc.target/i386/avx512f-vect-simd-10.c: New test.
>
>--- gcc/doc/md.texi.jj 2019-06-13 00:35:43.518942525 +0200
>+++ gcc/doc/md.texi    2019-06-18 15:32:38.496629946 +0200
>@@ -5454,6 +5454,14 @@ in operand 2.  Store the result in vecto
> 0 and 1 have mode @var{m} and operand 2 has the mode appropriate for
> one element of @var{m}.
> 
>+@cindex @code{vec_shl_@var{m}} instruction pattern
>+@item @samp{vec_shl_@var{m}}
>+Whole vector left shift in bits, i.e.@: away from element 0.
>+Operand 1 is a vector to be shifted.
>+Operand 2 is an integer shift amount in bits.
>+Operand 0 is where the resulting shifted vector is stored.
>+The output and input vectors should have the same modes.
>+
> @cindex @code{vec_shr_@var{m}} instruction pattern
> @item @samp{vec_shr_@var{m}}
> Whole vector right shift in bits, i.e.@: towards element 0.
>--- gcc/optabs.def.jj  2019-02-11 11:38:08.263617017 +0100
>+++ gcc/optabs.def     2019-06-18 14:56:57.934971410 +0200
>@@ -348,6 +348,7 @@ OPTAB_D (vec_packu_float_optab, "vec_pac
> OPTAB_D (vec_perm_optab, "vec_perm$a")
> OPTAB_D (vec_realign_load_optab, "vec_realign_load_$a")
> OPTAB_D (vec_set_optab, "vec_set$a")
>+OPTAB_D (vec_shl_optab, "vec_shl_$a")
> OPTAB_D (vec_shr_optab, "vec_shr_$a")
>OPTAB_D (vec_unpack_sfix_trunc_hi_optab, "vec_unpack_sfix_trunc_hi_$a")
>OPTAB_D (vec_unpack_sfix_trunc_lo_optab, "vec_unpack_sfix_trunc_lo_$a")
>--- gcc/optabs.c.jj    2019-02-13 13:11:47.927612362 +0100
>+++ gcc/optabs.c       2019-06-18 16:45:29.347895585 +0200
>@@ -5444,19 +5444,45 @@ vector_compare_rtx (machine_mode cmp_mod
> }
> 
> /* Check if vec_perm mask SEL is a constant equivalent to a shift of
>-   the first vec_perm operand, assuming the second operand is a
>constant
>-   vector of zeros.  Return the shift distance in bits if so, or
>NULL_RTX
>-   if the vec_perm is not a shift.  MODE is the mode of the value
>being
>-   shifted.  */
>+   the first vec_perm operand, assuming the second operand (for left
>shift
>+   first operand) is a constant vector of zeros.  Return the shift
>distance
>+   in bits if so, or NULL_RTX if the vec_perm is not a shift.  MODE is
>the
>+   mode of the value being shifted.  SHIFT_OPTAB is vec_shr_optab for
>right
>+   shift or vec_shl_optab for left shift.  */
> static rtx
>-shift_amt_for_vec_perm_mask (machine_mode mode, const vec_perm_indices
>&sel)
>+shift_amt_for_vec_perm_mask (machine_mode mode, const vec_perm_indices
>&sel,
>+                           optab shift_optab)
> {
>   unsigned int bitsize = GET_MODE_UNIT_BITSIZE (mode);
>   poly_int64 first = sel[0];
>   if (maybe_ge (sel[0], GET_MODE_NUNITS (mode)))
>     return NULL_RTX;
> 
>-  if (!sel.series_p (0, 1, first, 1))
>+  if (shift_optab == vec_shl_optab)
>+    {
>+      unsigned int nelt;
>+      if (!GET_MODE_NUNITS (mode).is_constant (&nelt))
>+      return NULL_RTX;
>+      unsigned firstidx = 0;
>+      for (unsigned int i = 0; i < nelt; i++)
>+      {
>+        if (known_eq (sel[i], nelt))
>+          {
>+            if (i == 0 || firstidx)
>+              return NULL_RTX;
>+            firstidx = i;
>+          }
>+        else if (firstidx
>+                 ? maybe_ne (sel[i], nelt + i - firstidx)
>+                 : maybe_ge (sel[i], nelt))
>+          return NULL_RTX;
>+      }
>+
>+      if (firstidx == 0)
>+      return NULL_RTX;
>+      first = firstidx;
>+    }
>+  else if (!sel.series_p (0, 1, first, 1))
>     {
>       unsigned int nelt;
>       if (!GET_MODE_NUNITS (mode).is_constant (&nelt))
>@@ -5544,25 +5570,37 @@ expand_vec_perm_const (machine_mode mode
>      target instruction.  */
>   vec_perm_indices indices (sel, 2, GET_MODE_NUNITS (mode));
> 
>-  /* See if this can be handled with a vec_shr.  We only do this if
>the
>-     second vector is all zeroes.  */
>-  insn_code shift_code = optab_handler (vec_shr_optab, mode);
>-  insn_code shift_code_qi = ((qimode != VOIDmode && qimode != mode)
>-                           ? optab_handler (vec_shr_optab, qimode)
>-                           : CODE_FOR_nothing);
>-
>-  if (v1 == CONST0_RTX (GET_MODE (v1))
>-      && (shift_code != CODE_FOR_nothing
>-        || shift_code_qi != CODE_FOR_nothing))
>+  /* See if this can be handled with a vec_shr or vec_shl.  We only do
>this
>+     if the second (for vec_shr) or first (for vec_shl) vector is all
>+     zeroes.  */
>+  insn_code shift_code = CODE_FOR_nothing;
>+  insn_code shift_code_qi = CODE_FOR_nothing;
>+  optab shift_optab = unknown_optab;
>+  rtx v2 = v0;
>+  if (v1 == CONST0_RTX (GET_MODE (v1)))
>+    shift_optab = vec_shr_optab;
>+  else if (v0 == CONST0_RTX (GET_MODE (v0)))
>+    {
>+      shift_optab = vec_shl_optab;
>+      v2 = v1;
>+    }
>+  if (shift_optab != unknown_optab)
>+    {
>+      shift_code = optab_handler (shift_optab, mode);
>+      shift_code_qi = ((qimode != VOIDmode && qimode != mode)
>+                     ? optab_handler (shift_optab, qimode)
>+                     : CODE_FOR_nothing);
>+    }
>+  if (shift_code != CODE_FOR_nothing || shift_code_qi !=
>CODE_FOR_nothing)
>     {
>-      rtx shift_amt = shift_amt_for_vec_perm_mask (mode, indices);
>+      rtx shift_amt = shift_amt_for_vec_perm_mask (mode, indices,
>shift_optab);
>       if (shift_amt)
>       {
>         struct expand_operand ops[3];
>         if (shift_code != CODE_FOR_nothing)
>           {
>             create_output_operand (&ops[0], target, mode);
>-            create_input_operand (&ops[1], v0, mode);
>+            create_input_operand (&ops[1], v2, mode);
>             create_convert_operand_from_type (&ops[2], shift_amt, sizetype);
>             if (maybe_expand_insn (shift_code, 3, ops))
>               return ops[0].value;
>@@ -5571,7 +5609,7 @@ expand_vec_perm_const (machine_mode mode
>           {
>             rtx tmp = gen_reg_rtx (qimode);
>             create_output_operand (&ops[0], tmp, qimode);
>-            create_input_operand (&ops[1], gen_lowpart (qimode, v0),
>qimode);
>+            create_input_operand (&ops[1], gen_lowpart (qimode, v2),
>qimode);
>             create_convert_operand_from_type (&ops[2], shift_amt, sizetype);
>             if (maybe_expand_insn (shift_code_qi, 3, ops))
>               return gen_lowpart (mode, ops[0].value);
>--- gcc/optabs-query.c.jj      2019-05-20 11:40:16.691121967 +0200
>+++ gcc/optabs-query.c 2019-06-18 15:26:53.028980804 +0200
>@@ -415,8 +415,9 @@ can_vec_perm_var_p (machine_mode mode)
>    permute (if the target supports that).
> 
> Note that additional permutations representing whole-vector shifts may
>-   also be handled via the vec_shr optab, but only where the second
>input
>-   vector is entirely constant zeroes; this case is not dealt with
>here.  */
>+   also be handled via the vec_shr or vec_shl optab, but only where
>the
>+   second input vector is entirely constant zeroes; this case is not
>dealt
>+   with here.  */
> 
> bool
> can_vec_perm_const_p (machine_mode mode, const vec_perm_indices &sel,
>--- gcc/tree-vect-generic.c.jj 2019-01-07 09:47:32.988518893 +0100
>+++ gcc/tree-vect-generic.c    2019-06-18 16:35:29.033319526 +0200
>@@ -1367,6 +1367,32 @@ lower_vec_perm (gimple_stmt_iterator *gs
>             return;
>           }
>       }
>+      /* And similarly vec_shl pattern.  */
>+      if (optab_handler (vec_shl_optab, TYPE_MODE (vect_type))
>+        != CODE_FOR_nothing
>+        && TREE_CODE (vec0) == VECTOR_CST
>+        && initializer_zerop (vec0))
>+      {
>+        unsigned int first = 0;
>+        for (i = 0; i < elements; ++i)
>+          if (known_eq (poly_uint64 (indices[i]), elements))
>+            {
>+              if (i == 0 || first)
>+                break;
>+              first = i;
>+            }
>+          else if (first
>+                   ? maybe_ne (poly_uint64 (indices[i]),
>+                                            elements + i - first)
>+                   : maybe_ge (poly_uint64 (indices[i]), elements))
>+            break;
>+        if (i == elements)
>+          {
>+            gimple_assign_set_rhs3 (stmt, mask);
>+            update_stmt (stmt);
>+            return;
>+          }
>+      }
>     }
>   else if (can_vec_perm_var_p (TYPE_MODE (vect_type)))
>     return;
>--- gcc/tree-vect-stmts.c.jj   2019-06-17 23:18:53.620850072 +0200
>+++ gcc/tree-vect-stmts.c      2019-06-18 17:43:27.484350807 +0200
>@@ -6356,6 +6356,71 @@ scan_operand_equal_p (tree ref1, tree re
> 
> /* Function check_scan_store.
> 
>+   Verify if we can perform the needed permutations or whole vector
>shifts.
>+   Return -1 on failure, otherwise exact log2 of vectype's nunits.  */
>+
>+static int
>+scan_store_can_perm_p (tree vectype, tree init, int
>*use_whole_vector_p = NULL)
>+{
>+  enum machine_mode vec_mode = TYPE_MODE (vectype);
>+  unsigned HOST_WIDE_INT nunits;
>+  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits))
>+    return -1;
>+  int units_log2 = exact_log2 (nunits);
>+  if (units_log2 <= 0)
>+    return -1;
>+
>+  int i;
>+  for (i = 0; i <= units_log2; ++i)
>+    {
>+      unsigned HOST_WIDE_INT j, k;
>+      vec_perm_builder sel (nunits, nunits, 1);
>+      sel.quick_grow (nunits);
>+      if (i == 0)
>+      {
>+        for (j = 0; j < nunits; ++j)
>+          sel[j] = nunits - 1;
>+      }
>+      else
>+      {
>+        for (j = 0; j < (HOST_WIDE_INT_1U << (i - 1)); ++j)
>+          sel[j] = j;
>+        for (k = 0; j < nunits; ++j, ++k)
>+          sel[j] = nunits + k;
>+      }
>+      vec_perm_indices indices (sel, i == 0 ? 1 : 2, nunits);
>+      if (!can_vec_perm_const_p (vec_mode, indices))
>+      break;
>+    }
>+
>+  if (i == 0)
>+    return -1;
>+
>+  if (i <= units_log2)
>+    {
>+      if (optab_handler (vec_shl_optab, vec_mode) == CODE_FOR_nothing)
>+      return -1;
>+      int kind = 1;
>+      /* Whole vector shifts shift in zeros, so if init is all zero
>constant,
>+       there is no need to do anything further.  */
>+      if ((TREE_CODE (init) != INTEGER_CST
>+         && TREE_CODE (init) != REAL_CST)
>+        || !initializer_zerop (init))
>+      {
>+        tree masktype = build_same_sized_truth_vector_type (vectype);
>+        if (!expand_vec_cond_expr_p (vectype, masktype, VECTOR_CST))
>+          return -1;
>+        kind = 2;
>+      }
>+      if (use_whole_vector_p)
>+      *use_whole_vector_p = kind;
>+    }
>+  return units_log2;
>+}
>+
>+
>+/* Function check_scan_store.
>+
> Check magic stores for #pragma omp scan {in,ex}clusive reductions.  */
> 
> static bool
>@@ -6596,34 +6661,9 @@ check_scan_store (stmt_vec_info stmt_inf
>   if (!optab || optab_handler (optab, vec_mode) == CODE_FOR_nothing)
>     goto fail;
> 
>-  unsigned HOST_WIDE_INT nunits;
>-  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits))
>+  int units_log2 = scan_store_can_perm_p (vectype, *init);
>+  if (units_log2 == -1)
>     goto fail;
>-  int units_log2 = exact_log2 (nunits);
>-  if (units_log2 <= 0)
>-    goto fail;
>-
>-  for (int i = 0; i <= units_log2; ++i)
>-    {
>-      unsigned HOST_WIDE_INT j, k;
>-      vec_perm_builder sel (nunits, nunits, 1);
>-      sel.quick_grow (nunits);
>-      if (i == units_log2)
>-      {
>-        for (j = 0; j < nunits; ++j)
>-          sel[j] = nunits - 1;
>-      }
>-      else
>-      {
>-        for (j = 0; j < (HOST_WIDE_INT_1U << i); ++j)
>-          sel[j] = nunits + j;
>-        for (k = 0; j < nunits; ++j, ++k)
>-          sel[j] = k;
>-      }
>-      vec_perm_indices indices (sel, i == units_log2 ? 1 : 2, nunits);
>-      if (!can_vec_perm_const_p (vec_mode, indices))
>-      goto fail;
>-    }
> 
>   return true;
> }
>@@ -6686,7 +6726,8 @@ vectorizable_scan_store (stmt_vec_info s
>   unsigned HOST_WIDE_INT nunits;
>   if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits))
>     gcc_unreachable ();
>-  int units_log2 = exact_log2 (nunits);
>+  int use_whole_vector_p = 0;
>+  int units_log2 = scan_store_can_perm_p (vectype, *init,
>&use_whole_vector_p);
>   gcc_assert (units_log2 > 0);
>   auto_vec<tree, 16> perms;
>   perms.quick_grow (units_log2 + 1);
>@@ -6696,21 +6737,25 @@ vectorizable_scan_store (stmt_vec_info s
>       vec_perm_builder sel (nunits, nunits, 1);
>       sel.quick_grow (nunits);
>       if (i == units_log2)
>-      {
>-        for (j = 0; j < nunits; ++j)
>-          sel[j] = nunits - 1;
>-      }
>-      else
>-      {
>-        for (j = 0; j < (HOST_WIDE_INT_1U << i); ++j)
>-          sel[j] = nunits + j;
>-        for (k = 0; j < nunits; ++j, ++k)
>-          sel[j] = k;
>-      }
>+      for (j = 0; j < nunits; ++j)
>+        sel[j] = nunits - 1;
>+      else
>+        {
>+          for (j = 0; j < (HOST_WIDE_INT_1U << i); ++j)
>+            sel[j] = j;
>+          for (k = 0; j < nunits; ++j, ++k)
>+            sel[j] = nunits + k;
>+        }
>       vec_perm_indices indices (sel, i == units_log2 ? 1 : 2, nunits);
>-      perms[i] = vect_gen_perm_mask_checked (vectype, indices);
>+      if (use_whole_vector_p && i < units_log2)
>+      perms[i] = vect_gen_perm_mask_any (vectype, indices);
>+      else
>+      perms[i] = vect_gen_perm_mask_checked (vectype, indices);
>     }
> 
>+  tree zero_vec = use_whole_vector_p ? build_zero_cst (vectype) :
>NULL_TREE;
>+  tree masktype = (use_whole_vector_p == 2
>+                 ? build_same_sized_truth_vector_type (vectype) : NULL_TREE);
>   stmt_vec_info prev_stmt_info = NULL;
>   tree vec_oprnd1 = NULL_TREE;
>   tree vec_oprnd2 = NULL_TREE;
>@@ -6742,8 +6787,9 @@ vectorizable_scan_store (stmt_vec_info s
>       for (int i = 0; i < units_log2; ++i)
>       {
>         tree new_temp = make_ssa_name (vectype);
>-        gimple *g = gimple_build_assign (new_temp, VEC_PERM_EXPR, v,
>-                                         vec_oprnd1, perms[i]);
>+        gimple *g = gimple_build_assign (new_temp, VEC_PERM_EXPR,
>+                                         zero_vec ? zero_vec : vec_oprnd1, v,
>+                                         perms[i]);
>         new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
>         if (prev_stmt_info == NULL)
>           STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt_info;
>@@ -6751,6 +6797,25 @@ vectorizable_scan_store (stmt_vec_info s
>           STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
>         prev_stmt_info = new_stmt_info;
> 
>+        if (use_whole_vector_p == 2)
>+          {
>+            /* Whole vector shift shifted in zero bits, but if *init
>+               is not initializer_zerop, we need to replace those elements
>+               with elements from vec_oprnd1.  */
>+            tree_vector_builder vb (masktype, nunits, 1);
>+            for (unsigned HOST_WIDE_INT k = 0; k < nunits; ++k)
>+              vb.quick_push (k < (HOST_WIDE_INT_1U << i)
>+                             ? boolean_false_node : boolean_true_node);
>+
>+            tree new_temp2 = make_ssa_name (vectype);
>+            g = gimple_build_assign (new_temp2, VEC_COND_EXPR, vb.build (),
>+                                     new_temp, vec_oprnd1);
>+            new_stmt_info = vect_finish_stmt_generation (stmt_info, g,
>gsi);
>+            STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
>+            prev_stmt_info = new_stmt_info;
>+            new_temp = new_temp2;
>+          }
>+
>         tree new_temp2 = make_ssa_name (vectype);
>         g = gimple_build_assign (new_temp2, code, v, new_temp);
>         new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
>--- gcc/config/i386/sse.md.jj  2019-06-17 23:18:26.821267440 +0200
>+++ gcc/config/i386/sse.md     2019-06-18 15:37:28.342043528 +0200
>@@ -11758,6 +11758,19 @@ (define_insn "<shift_insn><mode>3<mask_n
>    (set_attr "mode" "<sseinsnmode>")])
> 
> 
>+(define_expand "vec_shl_<mode>"
>+  [(set (match_dup 3)
>+      (ashift:V1TI
>+       (match_operand:VI_128 1 "register_operand")
>+       (match_operand:SI 2 "const_0_to_255_mul_8_operand")))
>+   (set (match_operand:VI_128 0 "register_operand") (match_dup 4))]
>+  "TARGET_SSE2"
>+{
>+  operands[1] = gen_lowpart (V1TImode, operands[1]);
>+  operands[3] = gen_reg_rtx (V1TImode);
>+  operands[4] = gen_lowpart (<MODE>mode, operands[3]);
>+})
>+
> (define_expand "vec_shr_<mode>"
>   [(set (match_dup 3)
>       (lshiftrt:V1TI
>--- gcc/testsuite/gcc.dg/vect/vect-simd-8.c.jj 2019-06-17
>23:18:53.621850057 +0200
>+++ gcc/testsuite/gcc.dg/vect/vect-simd-8.c    2019-06-18
>18:02:09.428798006 +0200
>@@ -3,7 +3,9 @@
> /* { dg-additional-options "-mavx" { target avx_runtime } } */
>/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2 "vect"
>{ target i?86-*-* x86_64-*-* } } } */
> 
>+#ifndef main
> #include "tree-vect.h"
>+#endif
> 
> int r, a[1024], b[1024];
> 
>@@ -63,7 +65,9 @@ int
> main ()
> {
>   int s = 0;
>+#ifndef main
>   check_vect ();
>+#endif
>   for (int i = 0; i < 1024; ++i)
>     {
>       a[i] = i;
>--- gcc/testsuite/gcc.dg/vect/vect-simd-9.c.jj 2019-06-17
>23:18:53.621850057 +0200
>+++ gcc/testsuite/gcc.dg/vect/vect-simd-9.c    2019-06-18
>18:02:34.649406773 +0200
>@@ -3,7 +3,9 @@
> /* { dg-additional-options "-mavx" { target avx_runtime } } */
>/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2 "vect"
>{ target i?86-*-* x86_64-*-* } } } */
> 
>+#ifndef main
> #include "tree-vect.h"
>+#endif
> 
> int r, a[1024], b[1024];
> 
>@@ -65,7 +67,9 @@ int
> main ()
> {
>   int s = 0;
>+#ifndef main
>   check_vect ();
>+#endif
>   for (int i = 0; i < 1024; ++i)
>     {
>       a[i] = i;
>--- gcc/testsuite/gcc.dg/vect/vect-simd-10.c.jj        2019-06-18
>18:37:30.742838613 +0200
>+++ gcc/testsuite/gcc.dg/vect/vect-simd-10.c   2019-06-18
>19:44:20.614082076 +0200
>@@ -0,0 +1,96 @@
>+/* { dg-require-effective-target size32plus } */
>+/* { dg-additional-options "-fopenmp-simd" } */
>+/* { dg-additional-options "-mavx" { target avx_runtime } } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" { target i?86-*-* x86_64-*-* } } } */
>+
>+#ifndef main
>+#include "tree-vect.h"
>+#endif
>+
>+float r = 1.0f, a[1024], b[1024];
>+
>+__attribute__((noipa)) void
>+foo (float *a, float *b)
>+{
>+  #pragma omp simd reduction (inscan, *:r)
>+  for (int i = 0; i < 1024; i++)
>+    {
>+      r *= a[i];
>+      #pragma omp scan inclusive(r)
>+      b[i] = r;
>+    }
>+}
>+
>+__attribute__((noipa)) float
>+bar (void)
>+{
>+  float s = -__builtin_inff ();
>+  #pragma omp simd reduction (inscan, max:s)
>+  for (int i = 0; i < 1024; i++)
>+    {
>+      s = s > a[i] ? s : a[i];
>+      #pragma omp scan inclusive(s)
>+      b[i] = s;
>+    }
>+  return s;
>+}
>+
>+int
>+main ()
>+{
>+  float s = 1.0f;
>+#ifndef main
>+  check_vect ();
>+#endif
>+  for (int i = 0; i < 1024; ++i)
>+    {
>+      if (i < 80)
>+      a[i] = (i & 1) ? 0.25f : 0.5f;
>+      else if (i < 200)
>+      a[i] = (i % 3) == 0 ? 2.0f : (i % 3) == 1 ? 4.0f : 1.0f;
>+      else if (i < 280)
>+      a[i] = (i & 1) ? 0.25f : 0.5f;
>+      else if (i < 380)
>+      a[i] = (i % 3) == 0 ? 2.0f : (i % 3) == 1 ? 4.0f : 1.0f;
>+      else
>+      switch (i % 6)
>+        {
>+        case 0: a[i] = 0.25f; break;
>+        case 1: a[i] = 2.0f; break;
>+        case 2: a[i] = -1.0f; break;
>+        case 3: a[i] = -4.0f; break;
>+        case 4: a[i] = 0.5f; break;
>+        case 5: a[i] = 1.0f; break;
>+        default: a[i] = 0.0f; break;
>+        }
>+      b[i] = -19.0f;
>+      asm ("" : "+g" (i));
>+    }
>+  foo (a, b);
>+  if (r * 16384.0f != 0.125f)
>+    abort ();
>+  float m = -175.25f;
>+  for (int i = 0; i < 1024; ++i)
>+    {
>+      s *= a[i];
>+      if (b[i] != s)
>+      abort ();
>+      else
>+      {
>+        a[i] = m - ((i % 3) == 1 ? 2.0f : (i % 3) == 2 ? 4.0f : 0.0f);
>+        b[i] = -231.75f;
>+        m += 0.75f;
>+      }
>+    }
>+  if (bar () != 592.0f)
>+    abort ();
>+  s = -__builtin_inff ();
>+  for (int i = 0; i < 1024; ++i)
>+    {
>+      if (s < a[i])
>+      s = a[i];
>+      if (b[i] != s)
>+      abort ();
>+    }
>+  return 0;
>+}
>--- gcc/testsuite/gcc.target/i386/sse2-vect-simd-8.c.jj        2019-06-18
>17:59:27.182314827 +0200
>+++ gcc/testsuite/gcc.target/i386/sse2-vect-simd-8.c   2019-06-18
>18:19:48.417341734 +0200
>@@ -0,0 +1,16 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -msse2 -mno-sse3
>-fdump-tree-vect-details" } */
>+/* { dg-require-effective-target sse2 } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" } } */
>+
>+#include "sse2-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-8.c"
>+
>+static void
>+sse2_test (void)
>+{
>+  do_main ();
>+}
>--- gcc/testsuite/gcc.target/i386/sse2-vect-simd-9.c.jj        2019-06-18
>18:03:30.174545446 +0200
>+++ gcc/testsuite/gcc.target/i386/sse2-vect-simd-9.c   2019-06-18
>18:20:05.770072628 +0200
>@@ -0,0 +1,16 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -msse2 -mno-sse3
>-fdump-tree-vect-details" } */
>+/* { dg-require-effective-target sse2 } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" } } */
>+
>+#include "sse2-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-9.c"
>+
>+static void
>+sse2_test (void)
>+{
>+  do_main ();
>+}
>--- gcc/testsuite/gcc.target/i386/sse2-vect-simd-10.c.jj       2019-06-18
>19:46:09.015410603 +0200
>+++ gcc/testsuite/gcc.target/i386/sse2-vect-simd-10.c  2019-06-18
>19:50:31.621361409 +0200
>@@ -0,0 +1,15 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -msse2 -mno-sse3
>-fdump-tree-vect-details" } */
>+/* { dg-require-effective-target sse2 } */
>+
>+#include "sse2-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-10.c"
>+
>+static void
>+sse2_test (void)
>+{
>+  do_main ();
>+}
>--- gcc/testsuite/gcc.target/i386/avx2-vect-simd-8.c.jj        2019-06-18
>17:59:27.182314827 +0200
>+++ gcc/testsuite/gcc.target/i386/avx2-vect-simd-8.c   2019-06-18
>18:19:40.310467451 +0200
>@@ -0,0 +1,16 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -mavx2 -fdump-tree-vect-details" }
>*/
>+/* { dg-require-effective-target avx2 } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" } } */
>+
>+#include "avx2-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-8.c"
>+
>+static void
>+avx2_test (void)
>+{
>+  do_main ();
>+}
>--- gcc/testsuite/gcc.target/i386/avx2-vect-simd-9.c.jj        2019-06-18
>18:03:30.174545446 +0200
>+++ gcc/testsuite/gcc.target/i386/avx2-vect-simd-9.c   2019-06-18
>18:19:56.479216712 +0200
>@@ -0,0 +1,16 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -mavx2 -fdump-tree-vect-details" }
>*/
>+/* { dg-require-effective-target avx2 } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" } } */
>+
>+#include "avx2-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-9.c"
>+
>+static void
>+avx2_test (void)
>+{
>+  do_main ();
>+}
>--- gcc/testsuite/gcc.target/i386/avx2-vect-simd-10.c.jj       2019-06-18
>19:50:47.692113611 +0200
>+++ gcc/testsuite/gcc.target/i386/avx2-vect-simd-10.c  2019-06-18
>19:50:56.180982721 +0200
>@@ -0,0 +1,16 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -mavx2 -fdump-tree-vect-details" }
>*/
>+/* { dg-require-effective-target avx2 } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" } } */
>+
>+#include "avx2-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-10.c"
>+
>+static void
>+avx2_test (void)
>+{
>+  do_main ();
>+}
>--- gcc/testsuite/gcc.target/i386/avx512f-vect-simd-8.c.jj     2019-06-18
>17:59:27.182314827 +0200
>+++ gcc/testsuite/gcc.target/i386/avx512f-vect-simd-8.c        2019-06-18
>18:19:44.364404586 +0200
>@@ -0,0 +1,16 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -mavx512f -mprefer-vector-width=512
>-fdump-tree-vect-details" } */
>+/* { dg-require-effective-target avx512f } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" } } */
>+
>+#include "avx512f-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-8.c"
>+
>+static void
>+avx512f_test (void)
>+{
>+  do_main ();
>+}
>--- gcc/testsuite/gcc.target/i386/avx512f-vect-simd-9.c.jj     2019-06-18
>18:03:30.174545446 +0200
>+++ gcc/testsuite/gcc.target/i386/avx512f-vect-simd-9.c        2019-06-18
>18:20:00.884148400 +0200
>@@ -0,0 +1,16 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -mavx512f -mprefer-vector-width=512
>-fdump-tree-vect-details" } */
>+/* { dg-require-effective-target avx512f } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" } } */
>+
>+#include "avx512f-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-9.c"
>+
>+static void
>+avx512f_test (void)
>+{
>+  do_main ();
>+}
>--- gcc/testsuite/gcc.target/i386/avx512f-vect-simd-10.c.jj    2019-06-18
>19:51:12.309734025 +0200
>+++ gcc/testsuite/gcc.target/i386/avx512f-vect-simd-10.c       2019-06-18
>19:51:18.285641883 +0200
>@@ -0,0 +1,16 @@
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fopenmp-simd -mavx512f -mprefer-vector-width=512
>-fdump-tree-vect-details" } */
>+/* { dg-require-effective-target avx512f } */
>+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 2
>"vect" } } */
>+
>+#include "avx512f-check.h"
>+
>+#define main() do_main ()
>+
>+#include "../../gcc.dg/vect/vect-simd-10.c"
>+
>+static void
>+avx512f_test (void)
>+{
>+  do_main ();
>+}
>
>       Jakub

Re: [PATCH] Reintroduce vec_shl_optab and use it for #pragma omp scan inclusive

Reply via email to