On Wed, May 7, 2014 at 12:30 PM, Richard Earnshaw <rearn...@arm.com> wrote: > On 05/05/14 09:04, Richard Biener wrote: >> On Fri, May 2, 2014 at 12:39 PM, Richard Earnshaw <rearn...@arm.com> wrote: >>> On 02/05/14 11:28, James Greenhalgh wrote: >>>> On Fri, May 02, 2014 at 10:29:06AM +0100, pins...@gmail.com wrote: >>>>> >>>>> >>>>>> On May 2, 2014, at 2:21 AM, James Greenhalgh <james.greenha...@arm.com> >>>>>> wrote: >>>>>> >>>>>>> On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote: >>>>>>> On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh >>>>>>> <james.greenha...@arm.com> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as >>>>>>>> undefined/impossible, the neon intrinsics vabs intrinsics should >>>>>>>> behave as >>>>>>>> the hardware. That is to say, the pseudo-code sequence: >>>>>>> >>>>>>> >>>>>>> Only for signed integer types. You should be able to use an unsigned >>>>>>> integer type here instead. >>>>>> >>>>>> If anything, I think that puts us in a worse position. >>>>> >>>>> Not if you cast it back. >>>>> >>>>> >>>>>> The issue that >>>>>> inspires this patch is that GCC will happily fold: >>>>>> >>>>>> t1 = ABS_EXPR (x) >>>>>> t2 = GE_EXPR (t1, 0) >>>>>> >>>>>> to >>>>>> >>>>>> t2 = TRUE >>>>>> >>>>>> Surely an unsigned integer type is going to suffer the same fate? >>>>>> Certainly I >>>>>> can imagine somewhere in the compiler there being a fold path for: >>>>> >>>>> Yes but if add a cast from the unsigned type to the signed type gcc does >>>>> not >>>>> optimize that. If it does it is a bug since the overflow is defined there. >>>> >>>> I'm not sure I understand, are you saying I want to fold to: >>>> >>>> t1 = VIEW_CONVERT_EXPR (x, unsigned) >>>> t2 = ABS_EXPR (t1) >>>> t3 = VIEW_CONVERT_EXPR (t2, signed) >>>> >>>> Surely ABS_EXPR (unsigned) is a nop, and the two VIEW_CONVERTs cancel each >>>> other out leading to an overall NOP? It might just be Friday morning and a >>>> lack of coffee talking, but I think I need you to spell this one out to >>>> me in big letters! >>>> >>> >>> I agree. I think what you need is a type widening so that you get >>> >>> t1 = VEC_WIDEN (x) >>> t2 = ABS_EXPR (t1) >>> t3 = VEC_NARROW (t2) >>> >>> This then guarantees that the ABS expression cannot be undefined. I'm >>> less sure, however about the narrow causing a change in 'sign'. Has it >>> just punted the problem? Maybe you need >> >> Another option is to allow ABS_EXPR to have a TYPE_UNSIGNED >> result type, thus do abs(int) -> unsigned (what we have as absu_hwi). >> That is, have an ABS_EXPR that doesn't have the undefined issue >> (at expense of optimization in case the result is immediately casted >> back to signed) >> > > Yes, that would make more sense, and is, in effect, what the ARM VABS > instruction is doing (producing an unsigned result with no undefined > behaviour). > > I'm not sure I understand your 'at expense of optimization' comment, > though. Surely a cast back to signed is essentially a no-op, since > there's no representational change in the value (at least, not on 2's > complement machines)?
We can't derive a value range of [0, INT_MAX] for the (int)ABSU_EXPR. Richard. > >> Richard. >> >>> >>> t1 = VEC_WIDEN (x) >>> t2 = ABS_EXPR (t1) >>> t3 = VIEW_CONVERT_EXPR (x, unsigned) >>> t4 = VEC_NARROW (t3) >>> t5 = VIEW_CONVERT_EXPR (t4, signed) >>> >>> !!! >>> >>> How you capture this into RTL during expand, though, is another thing. >>> >>> R. >>> >>>>>> >>>>>> (unsigned >= 0) == TRUE >>>>>> >>>>>>>> >>>>>>>> a = vabs_s8 (vdup_n_s8 (-128)); >>>>>>>> assert (a >= 0); >>>>>>>> >>>>>>>> does not hold. As in hardware >>>>>>>> >>>>>>>> abs (-128) == -128 >>>>>>>> >>>>>>>> Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we >>>>>>>> should avoid >>>>>>>> it. In fact, we have to be even more careful than that, and keep the >>>>>>>> integer >>>>>>>> vabs intrinsics as an unspec in the back end. >>>>>>> >>>>>>> No it is not. The mistake is to use signed integer types here. Just >>>>>>> add a conversion to an unsigned integer vector and it will work >>>>>>> correctly. >>>>>>> In fact the ABS rtl code is not undefined for the overflow. >>>>>> >>>>>> Here we are covering ourselves against a seperate issue. For >>>>>> auto-vectorized >>>>>> code we want the SABD combine patterns to kick in whenever sensible. For >>>>>> intrinsics code, in the case where vsub_s8 (x, y) would cause an >>>>>> underflow: >>>>>> >>>>>> vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y) >>>>>> >>>>>> So in this case, the combine would be erroneous. Likewise SABA. >>>>> >>>>> This sounds like it would problematic for unsigned types and not just for >>>>> vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8 >>>>> instead. Since in rtl overflow and underflow is defined to be wrapping. >>>> >>>> There are no vabs_u8/vabd_u8 so I don't see how we can reach this point >>>> with unsigned types. Further, I have never thought of RTL having signed >>>> and unsigned types, just a bag of bits. We'll want to use unspec for the >>>> intrinsic version of vabd_s8 - but we'll want to specify the >>>> >>>> (abs (minus (reg) (reg))) >>>> >>>> behaviour so that auto-vectorized code can pick it up. >>>> >>>> So in the end we'll have these patterns: >>>> >>>> (abs >>>> (abs (reg))) >>>> >>>> (intrinsic_abs >>>> (unspec [(reg)] UNSPEC_ABS)) >>>> >>>> (abd >>>> (abs (minus (reg) (reg)))) >>>> >>>> (intrinsic_abd >>>> (unspec [(reg) (reg)] UNSPEC_ABD)) >>>> >>>> (aba >>>> (plus (abs (minus (reg) (reg))) (reg))) >>>> >>>> (intrinsic_aba >>>> (plus (unspec [(reg) (reg)] UNSPEC_ABD) (reg))) >>>> >>>> which should give us reasonable auto-vectorized code without triggering any >>>> of the issues mapping the semantics of the instructions to intrinsics. >>>> >>>> Thanks, >>>> James >>>> >>>>> >>>>> Thanks, >>>>> Andrew Pinski >>>>> >>>>>> >>>>>> Thanks, >>>>>> James >>>>> >>>> >>> >>> >> > >