On 05/05/14 09:04, Richard Biener wrote: > On Fri, May 2, 2014 at 12:39 PM, Richard Earnshaw <rearn...@arm.com> wrote: >> On 02/05/14 11:28, James Greenhalgh wrote: >>> On Fri, May 02, 2014 at 10:29:06AM +0100, pins...@gmail.com wrote: >>>> >>>> >>>>> On May 2, 2014, at 2:21 AM, James Greenhalgh <james.greenha...@arm.com> >>>>> wrote: >>>>> >>>>>> On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote: >>>>>> On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh >>>>>> <james.greenha...@arm.com> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as >>>>>>> undefined/impossible, the neon intrinsics vabs intrinsics should behave >>>>>>> as >>>>>>> the hardware. That is to say, the pseudo-code sequence: >>>>>> >>>>>> >>>>>> Only for signed integer types. You should be able to use an unsigned >>>>>> integer type here instead. >>>>> >>>>> If anything, I think that puts us in a worse position. >>>> >>>> Not if you cast it back. >>>> >>>> >>>>> The issue that >>>>> inspires this patch is that GCC will happily fold: >>>>> >>>>> t1 = ABS_EXPR (x) >>>>> t2 = GE_EXPR (t1, 0) >>>>> >>>>> to >>>>> >>>>> t2 = TRUE >>>>> >>>>> Surely an unsigned integer type is going to suffer the same fate? >>>>> Certainly I >>>>> can imagine somewhere in the compiler there being a fold path for: >>>> >>>> Yes but if add a cast from the unsigned type to the signed type gcc does >>>> not >>>> optimize that. If it does it is a bug since the overflow is defined there. >>> >>> I'm not sure I understand, are you saying I want to fold to: >>> >>> t1 = VIEW_CONVERT_EXPR (x, unsigned) >>> t2 = ABS_EXPR (t1) >>> t3 = VIEW_CONVERT_EXPR (t2, signed) >>> >>> Surely ABS_EXPR (unsigned) is a nop, and the two VIEW_CONVERTs cancel each >>> other out leading to an overall NOP? It might just be Friday morning and a >>> lack of coffee talking, but I think I need you to spell this one out to >>> me in big letters! >>> >> >> I agree. I think what you need is a type widening so that you get >> >> t1 = VEC_WIDEN (x) >> t2 = ABS_EXPR (t1) >> t3 = VEC_NARROW (t2) >> >> This then guarantees that the ABS expression cannot be undefined. I'm >> less sure, however about the narrow causing a change in 'sign'. Has it >> just punted the problem? Maybe you need > > Another option is to allow ABS_EXPR to have a TYPE_UNSIGNED > result type, thus do abs(int) -> unsigned (what we have as absu_hwi). > That is, have an ABS_EXPR that doesn't have the undefined issue > (at expense of optimization in case the result is immediately casted > back to signed) >
Yes, that would make more sense, and is, in effect, what the ARM VABS instruction is doing (producing an unsigned result with no undefined behaviour). I'm not sure I understand your 'at expense of optimization' comment, though. Surely a cast back to signed is essentially a no-op, since there's no representational change in the value (at least, not on 2's complement machines)? > Richard. > >> >> t1 = VEC_WIDEN (x) >> t2 = ABS_EXPR (t1) >> t3 = VIEW_CONVERT_EXPR (x, unsigned) >> t4 = VEC_NARROW (t3) >> t5 = VIEW_CONVERT_EXPR (t4, signed) >> >> !!! >> >> How you capture this into RTL during expand, though, is another thing. >> >> R. >> >>>>> >>>>> (unsigned >= 0) == TRUE >>>>> >>>>>>> >>>>>>> a = vabs_s8 (vdup_n_s8 (-128)); >>>>>>> assert (a >= 0); >>>>>>> >>>>>>> does not hold. As in hardware >>>>>>> >>>>>>> abs (-128) == -128 >>>>>>> >>>>>>> Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should >>>>>>> avoid >>>>>>> it. In fact, we have to be even more careful than that, and keep the >>>>>>> integer >>>>>>> vabs intrinsics as an unspec in the back end. >>>>>> >>>>>> No it is not. The mistake is to use signed integer types here. Just >>>>>> add a conversion to an unsigned integer vector and it will work >>>>>> correctly. >>>>>> In fact the ABS rtl code is not undefined for the overflow. >>>>> >>>>> Here we are covering ourselves against a seperate issue. For >>>>> auto-vectorized >>>>> code we want the SABD combine patterns to kick in whenever sensible. For >>>>> intrinsics code, in the case where vsub_s8 (x, y) would cause an >>>>> underflow: >>>>> >>>>> vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y) >>>>> >>>>> So in this case, the combine would be erroneous. Likewise SABA. >>>> >>>> This sounds like it would problematic for unsigned types and not just for >>>> vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8 >>>> instead. Since in rtl overflow and underflow is defined to be wrapping. >>> >>> There are no vabs_u8/vabd_u8 so I don't see how we can reach this point >>> with unsigned types. Further, I have never thought of RTL having signed >>> and unsigned types, just a bag of bits. We'll want to use unspec for the >>> intrinsic version of vabd_s8 - but we'll want to specify the >>> >>> (abs (minus (reg) (reg))) >>> >>> behaviour so that auto-vectorized code can pick it up. >>> >>> So in the end we'll have these patterns: >>> >>> (abs >>> (abs (reg))) >>> >>> (intrinsic_abs >>> (unspec [(reg)] UNSPEC_ABS)) >>> >>> (abd >>> (abs (minus (reg) (reg)))) >>> >>> (intrinsic_abd >>> (unspec [(reg) (reg)] UNSPEC_ABD)) >>> >>> (aba >>> (plus (abs (minus (reg) (reg))) (reg))) >>> >>> (intrinsic_aba >>> (plus (unspec [(reg) (reg)] UNSPEC_ABD) (reg))) >>> >>> which should give us reasonable auto-vectorized code without triggering any >>> of the issues mapping the semantics of the instructions to intrinsics. >>> >>> Thanks, >>> James >>> >>>> >>>> Thanks, >>>> Andrew Pinski >>>> >>>>> >>>>> Thanks, >>>>> James >>>> >>> >> >> >