Re: [AArch64] Fix integer vabs intrinsics

Richard Earnshaw Wed, 07 May 2014 03:40:13 -0700

On 07/05/14 11:32, Richard Biener wrote:
> On Wed, May 7, 2014 at 12:30 PM, Richard Earnshaw <[email protected]> wrote:
>> On 05/05/14 09:04, Richard Biener wrote:
>>> On Fri, May 2, 2014 at 12:39 PM, Richard Earnshaw <[email protected]> wrote:
>>>> On 02/05/14 11:28, James Greenhalgh wrote:
>>>>> On Fri, May 02, 2014 at 10:29:06AM +0100, [email protected] wrote:
>>>>>>
>>>>>>
>>>>>>> On May 2, 2014, at 2:21 AM, James Greenhalgh <[email protected]> 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote:
>>>>>>>> On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh
>>>>>>>> <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as
>>>>>>>>> undefined/impossible, the neon intrinsics vabs intrinsics should 
>>>>>>>>> behave as
>>>>>>>>> the hardware. That is to say, the pseudo-code sequence:
>>>>>>>>
>>>>>>>>
>>>>>>>> Only for signed integer types.  You should be able to use an unsigned
>>>>>>>> integer type here instead.
>>>>>>>
>>>>>>> If anything, I think that puts us in a worse position.
>>>>>>
>>>>>> Not if you cast it back.
>>>>>>
>>>>>>
>>>>>>> The issue that
>>>>>>> inspires this patch is that GCC will happily fold:
>>>>>>>
>>>>>>>  t1 = ABS_EXPR (x)
>>>>>>>  t2 = GE_EXPR (t1, 0)
>>>>>>>
>>>>>>> to
>>>>>>>
>>>>>>>  t2 = TRUE
>>>>>>>
>>>>>>> Surely an unsigned integer type is going to suffer the same fate? 
>>>>>>> Certainly I
>>>>>>> can imagine somewhere in the compiler there being a fold path for:
>>>>>>
>>>>>> Yes but if add a cast from the unsigned type to the signed type gcc does 
>>>>>> not
>>>>>> optimize that. If it does it is a bug since the overflow is defined 
>>>>>> there.
>>>>>
>>>>> I'm not sure I understand, are you saying I want to fold to:
>>>>>
>>>>>   t1 = VIEW_CONVERT_EXPR (x, unsigned)
>>>>>   t2 = ABS_EXPR (t1)
>>>>>   t3 = VIEW_CONVERT_EXPR (t2, signed)
>>>>>
>>>>> Surely ABS_EXPR (unsigned) is a nop, and the two VIEW_CONVERTs cancel each
>>>>> other out leading to an overall NOP? It might just be Friday morning and a
>>>>> lack of coffee talking, but I think I need you to spell this one out to
>>>>> me in big letters!
>>>>>
>>>>
>>>> I agree.  I think what you need is a type widening so that you get
>>>>
>>>> t1 = VEC_WIDEN (x)
>>>> t2 = ABS_EXPR (t1)
>>>> t3 = VEC_NARROW (t2)
>>>>
>>>> This then guarantees that the ABS expression cannot be undefined.  I'm
>>>> less sure, however about the narrow causing a change in 'sign'.  Has it
>>>> just punted the problem?  Maybe you need
>>>
>>> Another option is to allow ABS_EXPR to have a TYPE_UNSIGNED
>>> result type, thus do abs(int) -> unsigned (what we have as absu_hwi).
>>> That is, have an ABS_EXPR that doesn't have the undefined issue
>>> (at expense of optimization in case the result is immediately casted
>>> back to signed)
>>>
>>
>> Yes, that would make more sense, and is, in effect, what the ARM VABS
>> instruction is doing (producing an unsigned result with no undefined
>> behaviour).
>>
>> I'm not sure I understand your 'at expense of optimization' comment,
>> though.  Surely a cast back to signed is essentially a no-op, since
>> there's no representational change in the value (at least, not on 2's
>> complement machines)?
> 
> We can't derive a value range of [0, INT_MAX] for the (int)ABSU_EXPR.
>


Unless you're assuming that ABS_EXPR(INT_MIN) will always trap, then if
you can derive it for ABS_EXPR (which really returns [0,
INT_MAX]+UNSPECIFIED, I don't really see why you can't derive it for
(int)ABSU_EXPR, which returns [0, INT_MAX]+INT_MIN, since the latter is
a subset of the former).

R.

> Richard.
> 
>>
>>> Richard.
>>>
>>>>
>>>> t1 = VEC_WIDEN (x)
>>>> t2 = ABS_EXPR (t1)
>>>> t3 = VIEW_CONVERT_EXPR (x, unsigned)
>>>> t4 = VEC_NARROW (t3)
>>>> t5 = VIEW_CONVERT_EXPR (t4, signed)
>>>>
>>>> !!!
>>>>
>>>> How you capture this into RTL during expand, though, is another thing.
>>>>
>>>> R.
>>>>
>>>>>>>
>>>>>>>  (unsigned >= 0) == TRUE
>>>>>>>
>>>>>>>>>
>>>>>>>>>  a = vabs_s8 (vdup_n_s8 (-128));
>>>>>>>>>  assert (a >= 0);
>>>>>>>>>
>>>>>>>>> does not hold. As in hardware
>>>>>>>>>
>>>>>>>>>  abs (-128) == -128
>>>>>>>>>
>>>>>>>>> Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we 
>>>>>>>>> should avoid
>>>>>>>>> it. In fact, we have to be even more careful than that, and keep the 
>>>>>>>>> integer
>>>>>>>>> vabs intrinsics as an unspec in the back end.
>>>>>>>>
>>>>>>>> No it is not.  The mistake is to use signed integer types here.  Just
>>>>>>>> add a conversion to an unsigned integer vector and it will work
>>>>>>>> correctly.
>>>>>>>> In fact the ABS rtl code is not undefined for the overflow.
>>>>>>>
>>>>>>> Here we are covering ourselves against a seperate issue. For 
>>>>>>> auto-vectorized
>>>>>>> code we want the SABD combine patterns to kick in whenever sensible. For
>>>>>>> intrinsics code, in the case where vsub_s8 (x, y) would cause an 
>>>>>>> underflow:
>>>>>>>
>>>>>>>  vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y)
>>>>>>>
>>>>>>> So in this case, the combine would be erroneous. Likewise SABA.
>>>>>>
>>>>>> This sounds like it would problematic for unsigned types  and not just 
>>>>>> for
>>>>>> vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8
>>>>>> instead. Since in rtl overflow and underflow is defined to be wrapping.
>>>>>
>>>>> There are no vabs_u8/vabd_u8 so I don't see how we can reach this point
>>>>> with unsigned types. Further, I have never thought of RTL having signed
>>>>> and unsigned types, just a bag of bits. We'll want to use unspec for the
>>>>> intrinsic version of vabd_s8 - but we'll want to specify the
>>>>>
>>>>>   (abs (minus (reg) (reg)))
>>>>>
>>>>> behaviour so that auto-vectorized code can pick it up.
>>>>>
>>>>> So in the end we'll have these patterns:
>>>>>
>>>>>   (abs
>>>>>     (abs (reg)))
>>>>>
>>>>>   (intrinsic_abs
>>>>>     (unspec [(reg)] UNSPEC_ABS))
>>>>>
>>>>>   (abd
>>>>>     (abs (minus (reg) (reg))))
>>>>>
>>>>>   (intrinsic_abd
>>>>>     (unspec [(reg) (reg)] UNSPEC_ABD))
>>>>>
>>>>>   (aba
>>>>>     (plus (abs (minus (reg) (reg))) (reg)))
>>>>>
>>>>>   (intrinsic_aba
>>>>>     (plus (unspec [(reg) (reg)] UNSPEC_ABD) (reg)))
>>>>>
>>>>> which should give us reasonable auto-vectorized code without triggering 
>>>>> any
>>>>> of the issues mapping the semantics of the instructions to intrinsics.
>>>>>
>>>>> Thanks,
>>>>> James
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Andrew Pinski
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> James
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>

Re: [AArch64] Fix integer vabs intrinsics

Reply via email to