On Wed, Jul 17, 2019 at 05:22:38PM +0800, Kewen.Lin wrote:
> Good question, the vector rotation for byte looks like (others are similar):
>
> vrlb VRT,VRA,VRB
> do i=0 to 127 by 8
> sh = (VRB)[i+5:i+7]
> VRT[i:i+7] = (VRA)[i:i+7] <<< sh
> end
>
> It only takes care of the counts from 0 to prec-1 (inclusive) [log2(prec)
> bits]
> So it's fine even operands[2] are zero or negative.
>
> Take byte as example, prec is 8.
> - rot count is 0, then minus res gets 8. (out of 3 bits range), same as 0.
> - rot count is 9, then minus res gets -1. (3 bits parsed as 7), the
> original
> rot count 9 was parsed as 1 (in 3 bits range).
> - rot count is -1, then minus res gets 9, (3 bits parsed as 1), the original
> rot count was parsed as 7 (in 3 bits range).
>
> It's a good idea to just use negate! Thanks!!
Ok, so the hw for the vectors truncates, the question is how happy will the
RTL generic code with that. rs6000 defines SHIFT_COUNT_TRUNCATED to 0,
so the generic code can't assume there is a truncation going on. Either it
will punt some optimizations when it sees say negative or too large
shift/rotate count (that is the better case), or it might just assume there
is UB.
As the documentation says, for zero SHIFT_COUNT_TRUNCATED there is an option
of having a pattern with the truncation being explicit, so in your case
*vrotl<mode>3_and or similar that would have an explicit AND on the shift
operand with say {7, 7...} vector for the byte shifts etc. but emit in the
end identical instruction to vrotl<mode>3 and use the MINUS + that pattern
for vrotr<mode>3. If the rotate argument is CONST_VECTOR, you can of course
just canonicalize, i.e. perform -operands[2] & mask, fold that into constant
and keep using vrotl<mode>3 in that case.
Jakub