> On May 2, 2014, at 2:21 AM, James Greenhalgh <james.greenha...@arm.com> wrote: > >> On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote: >> On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh >> <james.greenha...@arm.com> wrote: >>> >>> Hi, >>> >>> Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as >>> undefined/impossible, the neon intrinsics vabs intrinsics should behave as >>> the hardware. That is to say, the pseudo-code sequence: >> >> >> Only for signed integer types. You should be able to use an unsigned >> integer type here instead. > > If anything, I think that puts us in a worse position.
Not if you cast it back. > The issue that > inspires this patch is that GCC will happily fold: > > t1 = ABS_EXPR (x) > t2 = GE_EXPR (t1, 0) > > to > > t2 = TRUE > > Surely an unsigned integer type is going to suffer the same fate? Certainly I > can imagine somewhere in the compiler there being a fold path for: Yes but if add a cast from the unsigned type to the signed type gcc does not optimize that. If it does it is a bug since the overflow is defined there. > > (unsigned >= 0) == TRUE > >>> >>> a = vabs_s8 (vdup_n_s8 (-128)); >>> assert (a >= 0); >>> >>> does not hold. As in hardware >>> >>> abs (-128) == -128 >>> >>> Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should >>> avoid >>> it. In fact, we have to be even more careful than that, and keep the integer >>> vabs intrinsics as an unspec in the back end. >> >> No it is not. The mistake is to use signed integer types here. Just >> add a conversion to an unsigned integer vector and it will work >> correctly. >> In fact the ABS rtl code is not undefined for the overflow. > > Here we are covering ourselves against a seperate issue. For auto-vectorized > code we want the SABD combine patterns to kick in whenever sensible. For > intrinsics code, in the case where vsub_s8 (x, y) would cause an underflow: > > vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y) > > So in this case, the combine would be erroneous. Likewise SABA. This sounds like it would problematic for unsigned types and not just for vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8 instead. Since in rtl overflow and underflow is defined to be wrapping. Thanks, Andrew Pinski > > Thanks, > James