On Sat, 10 Dec 2016, Allan Sandfeld Jensen wrote:
On Saturday 10 December 2016, Marc Glisse wrote:
On Sat, 10 Dec 2016, Marc Glisse wrote:
On Sat, 10 Dec 2016, Allan Sandfeld Jensen wrote:
Replaces the definitions of the shift intrinsics with GCC extension
syntax to
allow GCC to reason about what the instructions does. Tests are added to
ensure the intrinsics still produce the right instructions, and that a
few basic optimizations now work.
I don't think we can do it in such a straightforward way. Those
intrinsics are well defined for any shift argument, while operator<< and
operator>> are considered undefined for many input values. I believe you
may need to use an unsigned type for the lhs of left shifts, and write a
<< (b & 31) to match the semantics for the rhs (I haven't checked
Intel's doc). Which might require adding a few patterns in sse.md to
avoid generating code for that AND.
Oups, apparently I got that wrong, got confused with the scalar case. Left
shift by more than precision is well defined for vectors, but it gives 0,
it doesn't take the shift count modulo the precision. Which is even harder
to explain to gcc (maybe ((unsigned)b<=31)?a<<b:0...). Yes, the way we
model operations in gcc is often inconvenient.
There was a similar issue for LLVM (https://reviews.llvm.org/D3353).
Well, it is undefined behaviour by the C standard, but is it also undefined
inside GCC (since this specifically uses a GCC extension)? I would assume it
just produces 0.
In the optimizers, we tend to handle vectors the same as scalars. I don't
remember if we have any optimization that takes advantage of the limited
range for the second operand of shifts (mostly we disable optimizations
when there is a risk they might produce a shift amount larger than prec),
but I don't think we have specified the behavior for larger-than-prec
shifts, so it would be dangerous. Note that depending on the platform, gcc
may lower vector operations to smaller vectors or even scalars, which
makes it inconvenient to have different semantics for vectors and scalars
(not impossible, we would for instance need to teach the lowering pass
that vec1 << vec2 lowers to { ((unsigned)vec2[0]<=31)?vec1[0]<<vec2[0]:0,
((unsigned)vec2[1]<=31)?vec1[1]<<vec2[1]:0 }). I think on some platforms,
shifting vectors does use the modulo behavior, or handles negative shift
values as shifts in the other direction, we can't look at just one target.
We could have a middle-end policy of vec << 1234 is not undefined (we
cannot assume it doesn't happen) but we don't know what it is defined to,
so we avoid any optimization that may have it as input or output. That
might mean disabling all shift optimizations though, which would be
counter productive.
The non-immediate case is simpler then, because it produces
the instructions which will inherently return the right thing.
It will only generate the instructions you expect if it hasn't already
optimized it to something else, which might not even involve a shift
anymore.
I expect reviewers (I am only commenting) will need convincing arguments
that this is safe (I could be wrong, maybe they'll find reasons that I
missed that make it obviously safe).
--
Marc Glisse