On Sun, 30 Sep 2012, Marc Glisse wrote:
On Sat, 29 Sep 2012, Eric Botcazou wrote:
this patch lets the compiler try to rewrite:
(vec_concat (vec_select x [a]) (vec_select x [b]))
as:
vec_select x [a b]
or even just "x" if appropriate.
[...]
Why not generalizing to all kinds of VEC_SELECTs instead of just scalar
ones?
Ok, I changed the patch a bit to handle arbitrary VEC_SELECTs, and moved the
identity recognition to VEC_SELECT handling (where it belonged). Testing with
non-scalar VEC_SELECTs was limited though, because they are not that easy to
generate. Also, the identity case is the only one where it actually
optimized. To handle more cases, I'd have to look through several layers of
VEC_SELECTs, which gets a bit complicated (for instance, the permutation
0,1,3,2 will appear as a vec_concat of a vec_select(v,[0,1]) and a
vec_select(vec_select(v,[2,3]),[1,0]), or worse with a vec_concat in the
middle). It also didn't optimize 3,2,3,2, possibly because that meant
substituting the same rtx twice (I didn't go that far in gdb). Then there is
also the vec_duplicate case (I should try to replace vec_duplicate with
vec_concat in simplify-rtx to see what happens...). Still, the identity case
is nice to have.
I think I may have been a bit too hasty removing the restriction to the
scalar case (and even that version was possibly wrong for some targets).
For instance, if I have a vec_concat of 2 permutations of a vector, this
will generate a vec_select with a result twice the size of its input,
which will likely fail to be recognized. For now I have only managed to
reach this situation in combine, where the unrecognizable expression is
simply ignored. But I think simplify-rtx is used in less forgiving places
(and even in combine it could cause optimizations to be missed).
My current understanding of simplify-rtx is that we should only do "safe"
optimizations in it (make sure we only create expressions that every
target will recognize), and if I want more advanced optimizations, I
should do them elsewhere (not sure where). So I should probably at least
restrict this one to the case where the result and XEXP (trueop0, 0) have
the same mode.
Does that make sense? Or is the current code ok and I am worrying for
nothing?
For reference, the conversation started here:
http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00540.html
and I include a copy of the relevant part of the patch that was committed:
2012-09-09 Marc Glisse <marc.gli...@inria.fr>
gcc/
* simplify-rtx.c (simplify_binary_operation_1)
<VEC_CONCAT>: Handle VEC_SELECTs from the same vector.
+ /* Try to merge VEC_SELECTs from the same vector into a single one. */
+ if (GET_CODE (trueop0) == VEC_SELECT
+ && GET_CODE (trueop1) == VEC_SELECT
+ && rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0)))
+ {
+ rtx par0 = XEXP (trueop0, 1);
+ rtx par1 = XEXP (trueop1, 1);
+ int len0 = XVECLEN (par0, 0);
+ int len1 = XVECLEN (par1, 0);
+ rtvec vec = rtvec_alloc (len0 + len1);
+ for (int i = 0; i < len0; i++)
+ RTVEC_ELT (vec, i) = XVECEXP (par0, 0, i);
+ for (int i = 0; i < len1; i++)
+ RTVEC_ELT (vec, len0 + i) = XVECEXP (par1, 0, i);
+ return simplify_gen_binary (VEC_SELECT, mode, XEXP (trueop0, 0),
+ gen_rtx_PARALLEL (VOIDmode, vec));
+ }
--
Marc Glisse