On Thu, 30 Oct 2014, Richard Biener wrote:

On Thu, 30 Oct 2014, Jakub Jelinek wrote:

On Thu, Oct 30, 2014 at 09:56:32AM +0100, Richard Biener wrote:

The following patch makes fold_ternary no longer make
VEC_PERMs valid for the target invalid.  As pointed out
in the PR we only need to make sure this doesn't happen
after vector lowering.

Well, even if you do that before vector lowering, if the original
mask is fine and new one is not, you'd seriously pessimize code.

Note that what fold does here isn't very elaborate - it just
tries hard to make a two-input VEC_PERM a one-input one which
is good for canonicalization and CSE.  I'd say that the
optabs.c code should ideally be able to recover the working
variant (or the target expanders should be more clever...).

It may be hard for optabs.c. The target expanders should really be fixed. This isn't the same as when we were talking of combining any permutations, here it is something that should be fairly easy to do. In that sense, simply avoiding the ICE (at least in release mode) and leaving optimization to the targets should be enough. Still, I am proposing something closer to Jakub's suggestion below.

How about moving the VEC_PERM_EXPR arg2 == VECTOR_CST folding
into a separate function with single_arg argument, call it with
operand_equal_p (op0, op1, 0) initially and call that function again
if single_arg and !can_vec_perm_p (...), that time with
single_arg parameter false?

If the original permutation is !can_vec_perm_p, I believe we should transform. That can indeed be done with a function. I did it without a function, but I can try to rewrite it if you want.

Or how about removing the code instead and doing it during
vector lowering if the original permute is not !can_vec_perm_p?

I'd rather not delay that much.


Here is a proposed patch that passed bootstrap+testsuite on x86_64-linux-gnu. I did not test on arm (or whichever platform it was that failed). If can_vec_perm_p is costly, we can test single_arg first (otherwise the 2 can_vec_perm_p must be the same) and test PROP_gimple_lvec before the second can_vec_perm_p (which must answer true then).

I don't remember what the arg2 == op2 test is about, so I kept it. I also didn't try to fix TREE_SIDE_EFFECTS handling, a quick test didn't trigger that issue.

2014-11-03  Marc Glisse  <marc.gli...@inria.fr>

        * fold-const.c: Include "optabs.h".
        (fold_ternary_loc) <VEC_PERM_EXPR>: Avoid canonicalizing a
        can_vec_perm_p permutation to one that is not.

--
Marc Glisse
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c    (revision 217007)
+++ gcc/fold-const.c    (working copy)
@@ -75,20 +75,21 @@ along with GCC; see the file COPYING3.
 #include "gimple.h"
 #include "gimplify.h"
 #include "tree-dfa.h"
 #include "hash-table.h"  /* Required for ENABLE_FOLD_CHECKING.  */
 #include "builtins.h"
 #include "hash-map.h"
 #include "plugin-api.h"
 #include "ipa-ref.h"
 #include "cgraph.h"
 #include "generic-match.h"
+#include "optabs.h"
 
 /* Nonzero if we are folding constants inside an initializer; zero
    otherwise.  */
 int folding_initializer = 0;
 
 /* The following constants represent a bit based encoding of GCC's
    comparison operators.  This encoding simplifies transformations
    on relational comparison operators, such as AND and OR.  */
 enum comparison_code {
   COMPCODE_FALSE = 0,
@@ -14189,47 +14190,47 @@ fold_ternary_loc (location_t loc, enum t
        return fold_build2_loc (loc, PLUS_EXPR, type,
                                const_binop (MULT_EXPR, arg0, arg1), arg2);
       if (integer_zerop (arg2))
        return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1);
 
       return fold_fma (loc, type, arg0, arg1, arg2);
 
     case VEC_PERM_EXPR:
       if (TREE_CODE (arg2) == VECTOR_CST)
        {
-         unsigned int nelts = TYPE_VECTOR_SUBPARTS (type), i, mask;
+         unsigned int nelts = TYPE_VECTOR_SUBPARTS (type), i, mask, mask2;
          unsigned char *sel = XALLOCAVEC (unsigned char, nelts);
+         unsigned char *sel2 = XALLOCAVEC (unsigned char, nelts);
          bool need_mask_canon = false;
+         bool need_mask_canon2 = false;
          bool all_in_vec0 = true;
          bool all_in_vec1 = true;
          bool maybe_identity = true;
          bool single_arg = (op0 == op1);
          bool changed = false;
 
          mask = single_arg ? (nelts - 1) : (2 * nelts - 1);
+         mask2 = 2 * nelts - 1;
          gcc_assert (nelts == VECTOR_CST_NELTS (arg2));
          for (i = 0; i < nelts; i++)
            {
              tree val = VECTOR_CST_ELT (arg2, i);
              if (TREE_CODE (val) != INTEGER_CST)
                return NULL_TREE;
 
              /* Make sure that the perm value is in an acceptable
                 range.  */
              wide_int t = val;
-             if (wi::gtu_p (t, mask))
-               {
-                 need_mask_canon = true;
-                 sel[i] = t.to_uhwi () & mask;
-               }
-             else
-               sel[i] = t.to_uhwi ();
+             need_mask_canon |= wi::gtu_p (t, mask);
+             need_mask_canon2 |= wi::gtu_p (t, mask2);
+             sel[i] = t.to_uhwi () & mask;
+             sel2[i] = t.to_uhwi () & mask2;
 
              if (sel[i] < nelts)
                all_in_vec1 = false;
              else
                all_in_vec0 = false;
 
              if ((sel[i] & (nelts-1)) != i)
                maybe_identity = false;
            }
 
@@ -14257,20 +14258,31 @@ fold_ternary_loc (location_t loc, enum t
                  || TREE_CODE (op1) == CONSTRUCTOR))
            {
              tree t = fold_vec_perm (type, op0, op1, sel);
              if (t != NULL_TREE)
                return t;
            }
 
          if (op0 == op1 && !single_arg)
            changed = true;
 
+         /* Some targets are deficient and fail to expand a single
+            argument permutation while still allowing an equivalent
+            2-argument version.  */
+         if (need_mask_canon && arg2 == op2
+             && !can_vec_perm_p (TYPE_MODE (type), false, sel)
+             && can_vec_perm_p (TYPE_MODE (type), false, sel2))
+           {
+             need_mask_canon = need_mask_canon2;
+             sel = sel2;
+           }
+
          if (need_mask_canon && arg2 == op2)
            {
              tree *tsel = XALLOCAVEC (tree, nelts);
              tree eltype = TREE_TYPE (TREE_TYPE (arg2));
              for (i = 0; i < nelts; i++)
                tsel[i] = build_int_cst (eltype, sel[i]);
              op2 = build_vector (TREE_TYPE (arg2), tsel);
              changed = true;
            }
 

Reply via email to