On Sun, 14 Oct 2012, Marc Glisse wrote:
On Sun, 14 Oct 2012, Uros Bizjak wrote:
On Sat, Oct 13, 2012 at 10:52 AM, Marc Glisse <marc.gli...@inria.fr> wrote:
Hello,
this patch provides an alternate pattern to let combine recognize scalar
operations that preserve the high part of a vector. If the strategy is all
right, I could do the same for more operations (mul, div, ...). Something
similar is also possible for V4SF (different pattern though), but probably
not as useful.
But, we _do_ have vec_merge pattern that describes the operation.
Adding another one to each operation just to satisfy combine is IMO
not correct approach.
At some point I wondered about _replacing_ the existing pattern, so there
would only be one ;-)
The vec_merge pattern takes as argument 2 vectors instead of a vector and a
scalar, and describes the operation as a vector operation where we drop half
of the result, instead of a scalar operation where we re-add the top half of
the vector. I don't know if that's the most convenient choice. Adding code in
simplify-rtx to replace vec_merge with vec_concat / vec_select might be
easier than the other way around.
If the middle-end somehow gave us:
(plus X (vec_concat Y 0))
it would seem a bit strange to add an optimization that turns it into:
(vec_merge (plus X (subreg:V2DF Y)) X 1)
but then producing:
(vec_concat (plus (vec_select X 0) Y) (vec_select X 1))
would be strange as well.
(ignoring the signed zero issues here)
I'd rather see generic RTX simplification that
simplifies your proposed pattern to vec_merge pattern.
Ok, I'll see what I can do.
Also, as you mention in PR54855, Comment #5, the approach is too fragile...
I am not sure I can make the RTX simplification much less fragile... Whenever
I see (vec_concat X (vec_select Y 1)), I would have to check whether X is
some (possibly large) tree of scalar computations involving Y[0], move it all
to vec_merge computations, and fix other users of some of those scalars to
now use S[0]. Seems too hard, I would stop at single-operation X that is used
only once. Besides, the gain is larger in proportion when there is a single
operation :-)
Thank you for your comments,
Hello,
I experimented with the simplify-rtx transformation you suggested, see:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855
It works when the argument is a register, but not for memory (which is
where the constant is in the testcase). And the description of the
operation in sse.md does seem problematic. It says the second argument is:
(match_operand:VF_128 2 "nonimmediate_operand" "xm,xm"))
but Intel's documentation says "The source operand can be an XMM register
or a 64-bit memory location", not quite the same.
Do you think the .md description should really stay this way, or could we
change it to something that better reflects "64-bit memory location"?
--
Marc Glisse