On Wed, May 7, 2014 at 12:30 PM, Richard Earnshaw <rearn...@arm.com> wrote:
> On 05/05/14 09:04, Richard Biener wrote:
>> On Fri, May 2, 2014 at 12:39 PM, Richard Earnshaw <rearn...@arm.com> wrote:
>>> On 02/05/14 11:28, James Greenhalgh wrote:
>>>> On Fri, May 02, 2014 at 10:29:06AM +0100, pins...@gmail.com wrote:
>>>>>
>>>>>
>>>>>> On May 2, 2014, at 2:21 AM, James Greenhalgh <james.greenha...@arm.com> 
>>>>>> wrote:
>>>>>>
>>>>>>> On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote:
>>>>>>> On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh
>>>>>>> <james.greenha...@arm.com> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as
>>>>>>>> undefined/impossible, the neon intrinsics vabs intrinsics should 
>>>>>>>> behave as
>>>>>>>> the hardware. That is to say, the pseudo-code sequence:
>>>>>>>
>>>>>>>
>>>>>>> Only for signed integer types.  You should be able to use an unsigned
>>>>>>> integer type here instead.
>>>>>>
>>>>>> If anything, I think that puts us in a worse position.
>>>>>
>>>>> Not if you cast it back.
>>>>>
>>>>>
>>>>>> The issue that
>>>>>> inspires this patch is that GCC will happily fold:
>>>>>>
>>>>>>  t1 = ABS_EXPR (x)
>>>>>>  t2 = GE_EXPR (t1, 0)
>>>>>>
>>>>>> to
>>>>>>
>>>>>>  t2 = TRUE
>>>>>>
>>>>>> Surely an unsigned integer type is going to suffer the same fate? 
>>>>>> Certainly I
>>>>>> can imagine somewhere in the compiler there being a fold path for:
>>>>>
>>>>> Yes but if add a cast from the unsigned type to the signed type gcc does 
>>>>> not
>>>>> optimize that. If it does it is a bug since the overflow is defined there.
>>>>
>>>> I'm not sure I understand, are you saying I want to fold to:
>>>>
>>>>   t1 = VIEW_CONVERT_EXPR (x, unsigned)
>>>>   t2 = ABS_EXPR (t1)
>>>>   t3 = VIEW_CONVERT_EXPR (t2, signed)
>>>>
>>>> Surely ABS_EXPR (unsigned) is a nop, and the two VIEW_CONVERTs cancel each
>>>> other out leading to an overall NOP? It might just be Friday morning and a
>>>> lack of coffee talking, but I think I need you to spell this one out to
>>>> me in big letters!
>>>>
>>>
>>> I agree.  I think what you need is a type widening so that you get
>>>
>>> t1 = VEC_WIDEN (x)
>>> t2 = ABS_EXPR (t1)
>>> t3 = VEC_NARROW (t2)
>>>
>>> This then guarantees that the ABS expression cannot be undefined.  I'm
>>> less sure, however about the narrow causing a change in 'sign'.  Has it
>>> just punted the problem?  Maybe you need
>>
>> Another option is to allow ABS_EXPR to have a TYPE_UNSIGNED
>> result type, thus do abs(int) -> unsigned (what we have as absu_hwi).
>> That is, have an ABS_EXPR that doesn't have the undefined issue
>> (at expense of optimization in case the result is immediately casted
>> back to signed)
>>
>
> Yes, that would make more sense, and is, in effect, what the ARM VABS
> instruction is doing (producing an unsigned result with no undefined
> behaviour).
>
> I'm not sure I understand your 'at expense of optimization' comment,
> though.  Surely a cast back to signed is essentially a no-op, since
> there's no representational change in the value (at least, not on 2's
> complement machines)?

We can't derive a value range of [0, INT_MAX] for the (int)ABSU_EXPR.

Richard.

>
>> Richard.
>>
>>>
>>> t1 = VEC_WIDEN (x)
>>> t2 = ABS_EXPR (t1)
>>> t3 = VIEW_CONVERT_EXPR (x, unsigned)
>>> t4 = VEC_NARROW (t3)
>>> t5 = VIEW_CONVERT_EXPR (t4, signed)
>>>
>>> !!!
>>>
>>> How you capture this into RTL during expand, though, is another thing.
>>>
>>> R.
>>>
>>>>>>
>>>>>>  (unsigned >= 0) == TRUE
>>>>>>
>>>>>>>>
>>>>>>>>  a = vabs_s8 (vdup_n_s8 (-128));
>>>>>>>>  assert (a >= 0);
>>>>>>>>
>>>>>>>> does not hold. As in hardware
>>>>>>>>
>>>>>>>>  abs (-128) == -128
>>>>>>>>
>>>>>>>> Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we 
>>>>>>>> should avoid
>>>>>>>> it. In fact, we have to be even more careful than that, and keep the 
>>>>>>>> integer
>>>>>>>> vabs intrinsics as an unspec in the back end.
>>>>>>>
>>>>>>> No it is not.  The mistake is to use signed integer types here.  Just
>>>>>>> add a conversion to an unsigned integer vector and it will work
>>>>>>> correctly.
>>>>>>> In fact the ABS rtl code is not undefined for the overflow.
>>>>>>
>>>>>> Here we are covering ourselves against a seperate issue. For 
>>>>>> auto-vectorized
>>>>>> code we want the SABD combine patterns to kick in whenever sensible. For
>>>>>> intrinsics code, in the case where vsub_s8 (x, y) would cause an 
>>>>>> underflow:
>>>>>>
>>>>>>  vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y)
>>>>>>
>>>>>> So in this case, the combine would be erroneous. Likewise SABA.
>>>>>
>>>>> This sounds like it would problematic for unsigned types  and not just for
>>>>> vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8
>>>>> instead. Since in rtl overflow and underflow is defined to be wrapping.
>>>>
>>>> There are no vabs_u8/vabd_u8 so I don't see how we can reach this point
>>>> with unsigned types. Further, I have never thought of RTL having signed
>>>> and unsigned types, just a bag of bits. We'll want to use unspec for the
>>>> intrinsic version of vabd_s8 - but we'll want to specify the
>>>>
>>>>   (abs (minus (reg) (reg)))
>>>>
>>>> behaviour so that auto-vectorized code can pick it up.
>>>>
>>>> So in the end we'll have these patterns:
>>>>
>>>>   (abs
>>>>     (abs (reg)))
>>>>
>>>>   (intrinsic_abs
>>>>     (unspec [(reg)] UNSPEC_ABS))
>>>>
>>>>   (abd
>>>>     (abs (minus (reg) (reg))))
>>>>
>>>>   (intrinsic_abd
>>>>     (unspec [(reg) (reg)] UNSPEC_ABD))
>>>>
>>>>   (aba
>>>>     (plus (abs (minus (reg) (reg))) (reg)))
>>>>
>>>>   (intrinsic_aba
>>>>     (plus (unspec [(reg) (reg)] UNSPEC_ABD) (reg)))
>>>>
>>>> which should give us reasonable auto-vectorized code without triggering any
>>>> of the issues mapping the semantics of the instructions to intrinsics.
>>>>
>>>> Thanks,
>>>> James
>>>>
>>>>>
>>>>> Thanks,
>>>>> Andrew Pinski
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> James
>>>>>
>>>>
>>>
>>>
>>
>
>

Reply via email to