Richard Sandiford <[email protected]> writes:
> Richard Biener via Gcc-patches <[email protected]> writes:
>> The following makes sure to limit the shift operand when vectorizing
>> (short)((int)x >> 31) via (short)x >> 31 as the out of bounds shift
>> operand otherwise invokes undefined behavior. When we determine
>> whether we can demote the operand we know we at most shift in the
>> sign bit so we can adjust the shift amount.
>>
>> Note this has the possibility of un-CSEing common shift operands
>> as there's no good way to share pattern stmts between patterns.
>> We'd have to separately pattern recognize the definition.
>>
>> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>>
>> Not sure about LSHIFT_EXPR, it probably has the same issue but
>> the fallback optimistic zero for out-of-range shifts is at least
>> "corrrect". Not sure we ever try to demote rotates (probably not).
>
> I guess you mean "correct" for x86? But that's just a quirk of x86.
> IMO the behaviour is equally wrong for LSHIFT_EXPR.
Sorry for the multiple messages. Wanted to get something out quickly
because I wasn't sure how long it would take me to write this...
On rotates, for:
void
foo (unsigned short *restrict ptr)
{
for (int i = 0; i < 200; ++i)
{
unsigned int x = ptr[i] & 0xff0;
ptr[i] = (x << 1) | (x >> 31);
}
}
we do get:
can narrow to unsigned:13 without loss of precision: _5 = x_12 r>> 31;
although aarch64 doesn't provide rrotate patterns, so nothing actually
comes of it.
I think the handling of variable shifts is flawed for other reasons. Given:
void
uu (unsigned short *restrict ptr1, unsigned short *restrict ptr2)
{
for (int i = 0; i < 200; ++i)
ptr1[i] = ptr1[i] >> ptr2[i];
}
void
us (unsigned short *restrict ptr1, short *restrict ptr2)
{
for (int i = 0; i < 200; ++i)
ptr1[i] = ptr1[i] >> ptr2[i];
}
void
su (short *restrict ptr1, unsigned short *restrict ptr2)
{
for (int i = 0; i < 200; ++i)
ptr1[i] = ptr1[i] >> ptr2[i];
}
void
ss (short *restrict ptr1, short *restrict ptr2)
{
for (int i = 0; i < 200; ++i)
ptr1[i] = ptr1[i] >> ptr2[i];
}
we only narrow uu and ss, due to:
/* Ignore codes that don't take uniform arguments. */
if (!types_compatible_p (TREE_TYPE (op), type))
return;
in vect_determine_precisions_from_range. Maybe we should drop
the shift handling from there and instead rely on
vect_determine_precisions_from_users, extending:
if (TREE_CODE (shift) != INTEGER_CST
|| !wi::ltu_p (wi::to_widest (shift), precision))
return;
to handle ranges where the max is known to be < precision.
There again, if masking is enough for right shifts and right rotates,
maybe we should keep the current handling for then (with your fix)
and skip the types_compatible_p check for those cases.
So:
- restrict shift handling in vect_determine_precisions_from_range to
RSHIFT_EXPR and RROTATE_EXPR
- remove types_compatible_p restriction for those cases
- extend vect_determine_precisions_from_users shift handling to check
for ranges on the shift amount
Does that sound right?
Thanks,
Richard