On Sun, 2 Nov 2014, Marc Glisse wrote:
On Thu, 30 Oct 2014, Richard Biener wrote:
On Thu, 30 Oct 2014, Jakub Jelinek wrote:
On Thu, Oct 30, 2014 at 09:56:32AM +0100, Richard Biener wrote:
The following patch makes fold_ternary no longer make
VEC_PERMs valid for the target invalid. As pointed out
in the PR we only need to make sure this doesn't happen
after vector lowering.
Well, even if you do that before vector lowering, if the original
mask is fine and new one is not, you'd seriously pessimize code.
Note that what fold does here isn't very elaborate - it just
tries hard to make a two-input VEC_PERM a one-input one which
is good for canonicalization and CSE. I'd say that the
optabs.c code should ideally be able to recover the working
variant (or the target expanders should be more clever...).
It may be hard for optabs.c. The target expanders should really be fixed.
This isn't the same as when we were talking of combining any permutations,
here it is something that should be fairly easy to do. In that sense, simply
avoiding the ICE (at least in release mode) and leaving optimization to the
targets should be enough. Still, I am proposing something closer to Jakub's
suggestion below.
How about moving the VEC_PERM_EXPR arg2 == VECTOR_CST folding
into a separate function with single_arg argument, call it with
operand_equal_p (op0, op1, 0) initially and call that function again
if single_arg and !can_vec_perm_p (...), that time with
single_arg parameter false?
If the original permutation is !can_vec_perm_p, I believe we should
transform. That can indeed be done with a function. I did it without a
function, but I can try to rewrite it if you want.
Or how about removing the code instead and doing it during
vector lowering if the original permute is not !can_vec_perm_p?
I'd rather not delay that much.
Here is a proposed patch that passed bootstrap+testsuite on x86_64-linux-gnu.
I did not test on arm (or whichever platform it was that failed). If
can_vec_perm_p is costly, we can test single_arg first (otherwise the 2
can_vec_perm_p must be the same) and test PROP_gimple_lvec before the second
can_vec_perm_p (which must answer true then).
I don't remember what the arg2 == op2 test is about, so I kept it. I also
didn't try to fix TREE_SIDE_EFFECTS handling, a quick test didn't trigger
that issue.
2014-11-03 Marc Glisse <marc.gli...@inria.fr>
PR tree-optimization/63666
* fold-const.c: Include "optabs.h".
(fold_ternary_loc) <VEC_PERM_EXPR>: Avoid canonicalizing a
can_vec_perm_p permutation to one that is not.
(sorry for the incomplete ChangeLog in the previous email)
--
Marc Glisse