Richard Biener <[email protected]> writes:
> On Tue, 1 Aug 2023, Richard Sandiford wrote:
>
>> Richard Sandiford <[email protected]> writes:
>> > Richard Biener via Gcc-patches <[email protected]> writes:
>> >> The following makes sure to limit the shift operand when vectorizing
>> >> (short)((int)x >> 31) via (short)x >> 31 as the out of bounds shift
>> >> operand otherwise invokes undefined behavior. When we determine
>> >> whether we can demote the operand we know we at most shift in the
>> >> sign bit so we can adjust the shift amount.
>> >>
>> >> Note this has the possibility of un-CSEing common shift operands
>> >> as there's no good way to share pattern stmts between patterns.
>> >> We'd have to separately pattern recognize the definition.
>> >>
>> >> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>> >>
>> >> Not sure about LSHIFT_EXPR, it probably has the same issue but
>> >> the fallback optimistic zero for out-of-range shifts is at least
>> >> "corrrect". Not sure we ever try to demote rotates (probably not).
>> >
>> > I guess you mean "correct" for x86? But that's just a quirk of x86.
>> > IMO the behaviour is equally wrong for LSHIFT_EXPR.
>
> I meant "correct" for the constant folding that evaluates out-of-bound
> shifts as zero.
>
>> Sorry for the multiple messages. Wanted to get something out quickly
>> because I wasn't sure how long it would take me to write this...
>>
>> On rotates, for:
>>
>> void
>> foo (unsigned short *restrict ptr)
>> {
>> for (int i = 0; i < 200; ++i)
>> {
>> unsigned int x = ptr[i] & 0xff0;
>> ptr[i] = (x << 1) | (x >> 31);
>> }
>> }
>>
>> we do get:
>>
>> can narrow to unsigned:13 without loss of precision: _5 = x_12 r>> 31;
>>
>> although aarch64 doesn't provide rrotate patterns, so nothing actually
>> comes of it.
>
> I think it's still correct that we only need unsigned:13 for the input,
> we know other bits are zero. But of course when actually applying
> this as documented
>
> /* Record that STMT_INFO could be changed from operating on TYPE to
> operating on a type with the precision and sign given by PRECISION
> and SIGN respectively.
>
> the operation itself has to be altered (the above doesn't suggest
> promoting/demoting the operands to TYPE is the only thing to do).
>
> So it seems to be the burden is on the consumers of the information?
Yeah, textually that seems fair. Not sure I was thinking of it in
those terms at the time though. :)
>> I think the handling of variable shifts is flawed for other reasons. Given:
>>
>> void
>> uu (unsigned short *restrict ptr1, unsigned short *restrict ptr2)
>> {
>> for (int i = 0; i < 200; ++i)
>> ptr1[i] = ptr1[i] >> ptr2[i];
>> }
>>
>> void
>> us (unsigned short *restrict ptr1, short *restrict ptr2)
>> {
>> for (int i = 0; i < 200; ++i)
>> ptr1[i] = ptr1[i] >> ptr2[i];
>> }
>>
>> void
>> su (short *restrict ptr1, unsigned short *restrict ptr2)
>> {
>> for (int i = 0; i < 200; ++i)
>> ptr1[i] = ptr1[i] >> ptr2[i];
>> }
>>
>> void
>> ss (short *restrict ptr1, short *restrict ptr2)
>> {
>> for (int i = 0; i < 200; ++i)
>> ptr1[i] = ptr1[i] >> ptr2[i];
>> }
>>
>> we only narrow uu and ss, due to:
>>
>> /* Ignore codes that don't take uniform arguments. */
>> if (!types_compatible_p (TREE_TYPE (op), type))
>> return;
>
> I suppose that's because we care about the shift operand at all here.
> We could possibly use [0 .. precision-1] as known range for it
> and only if that doesn't fit 'type' give up (and otherwise simply
> ignore the input range of the shift operands here).
>
>> in vect_determine_precisions_from_range. Maybe we should drop
>> the shift handling from there and instead rely on
>> vect_determine_precisions_from_users, extending:
>>
>> if (TREE_CODE (shift) != INTEGER_CST
>> || !wi::ltu_p (wi::to_widest (shift), precision))
>> return;
>>
>> to handle ranges where the max is known to be < precision.
>>
>> There again, if masking is enough for right shifts and right rotates,
>> maybe we should keep the current handling for then (with your fix)
>> and skip the types_compatible_p check for those cases.
>
> I think it should be enough for left-shifts as well? If we lshift
> out like 0x100 << 9 so the lhs range is [0,0] the input range from
> op0 will still make us use HImode. I think we only ever get overly
> conservative answers for left-shifts from this function?
But if we have:
short x, y;
int z = (int) x << (int) y;
and at runtime, x == 1, y == 16, (short) z should be 0 (no UB),
whereas x << y would invoke UB and x << (y & 15) would be 1.
> Whatever works for RROTATE should also work for LROTATE.
I think the same problem affects LROTATE.
>> So:
>>
>> - restrict shift handling in vect_determine_precisions_from_range to
>> RSHIFT_EXPR and RROTATE_EXPR
>>
>> - remove types_compatible_p restriction for those cases
>>
>> - extend vect_determine_precisions_from_users shift handling to check
>> for ranges on the shift amount
>>
>> Does that sound right?
>
> I'm not sure. This all felt somewhat fragile when looking closer
> (I was hoping you would eventually tackle it from the older
> referenced bug) ... so my main idea was to perform incremental changes
> where I have test coverage (as with the new wrong-code bug).
Fair :)
> Originally I completely disabled shift support but that regressed
> the over-widen testcases a lot which at least have widened shifts
> by constants a lot.
>
> x86 has vector rotates only for AMD XOP (which is dead) plus
> some for V1TImode AFAICS, but I think we pattern-match rotates
> to shifts, so maybe the precision stuff is interesting for the
> case where we match the pattern rotate sequence for widenings?
>
> So for the types_compatible_p issue something along
> the following? We could also exempt the shift operand from
> being covered by min_precision so the consumer would have
> to make sure it can be represented (I think that's never going
> to be an issue in practice until we get 256bit integers vectorized).
> It will have to fixup the shift operands anyway.
>
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index e4ab8c2d65b..cdeeaf98a47 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -6378,16 +6378,26 @@ vect_determine_precisions_from_range
> (stmt_vec_info stmt_info, gassign *stmt)
> }
> else if (TREE_CODE (op) == SSA_NAME)
> {
> - /* Ignore codes that don't take uniform arguments. */
> - if (!types_compatible_p (TREE_TYPE (op), type))
> + /* Ignore codes that don't take uniform arguments. For shifts
> + the shift amount is known to be in-range. */
I guess it's more "we can assume that the amount is in range"?
> + if (code == LSHIFT_EXPR
> + || code == RSHIFT_EXPR
> + || code == LROTATE_EXPR
> + || code == RROTATE_EXPR)
> + {
> + min_value = wi::min (min_value, 0, sign);
> + max_value = wi::max (max_value, TYPE_PRECISION (type),
> sign);
LGTM for shifts right. Because of the above lshift thing, I think we
need something like:
if (code == LSHIFT_EXPR || code == LROTATE_EXPR)
{
wide_int op_min_value, op_max_value;
if (!vect_get_range_info (op, &op_min_value, op_max_value))
return;
/* We can ignore left shifts by negative amounts, which are UB. */
min_value = wi::min (min_value, 0, sign);
/* Make sure the highest non-UB shift amount doesn't become UB. */
op_max_value = wi::umin (op_max_value, TYPE_PRECISION (type));
auto mask = wi::mask (TYPE_PRECISION (type), false,
op_max_value.to_uhwi ());
max_value = wi::max (max_value, mask, sign);
}
Does that look right?
Thanks, and sorry for the scope creep.
Richard
> + }
> + else if (types_compatible_p (TREE_TYPE (op), type))
> + {
> + wide_int op_min_value, op_max_value;
> + if (!vect_get_range_info (op, &op_min_value,
> &op_max_value))
> + return;
> + min_value = wi::min (min_value, op_min_value, sign);
> + max_value = wi::max (max_value, op_max_value, sign);
> + }
> + else
> return;
> -
> - wide_int op_min_value, op_max_value;
> - if (!vect_get_range_info (op, &op_min_value, &op_max_value))
> - return;
> -
> - min_value = wi::min (min_value, op_min_value, sign);
> - max_value = wi::max (max_value, op_max_value, sign);
> }
> else
> return;