On 07/05/14 11:32, Richard Biener wrote: > On Wed, May 7, 2014 at 12:30 PM, Richard Earnshaw <rearn...@arm.com> wrote: >> On 05/05/14 09:04, Richard Biener wrote: >>> On Fri, May 2, 2014 at 12:39 PM, Richard Earnshaw <rearn...@arm.com> wrote: >>>> On 02/05/14 11:28, James Greenhalgh wrote: >>>>> On Fri, May 02, 2014 at 10:29:06AM +0100, pins...@gmail.com wrote: >>>>>> >>>>>> >>>>>>> On May 2, 2014, at 2:21 AM, James Greenhalgh <james.greenha...@arm.com> >>>>>>> wrote: >>>>>>> >>>>>>>> On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote: >>>>>>>> On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh >>>>>>>> <james.greenha...@arm.com> wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as >>>>>>>>> undefined/impossible, the neon intrinsics vabs intrinsics should >>>>>>>>> behave as >>>>>>>>> the hardware. That is to say, the pseudo-code sequence: >>>>>>>> >>>>>>>> >>>>>>>> Only for signed integer types. You should be able to use an unsigned >>>>>>>> integer type here instead. >>>>>>> >>>>>>> If anything, I think that puts us in a worse position. >>>>>> >>>>>> Not if you cast it back. >>>>>> >>>>>> >>>>>>> The issue that >>>>>>> inspires this patch is that GCC will happily fold: >>>>>>> >>>>>>> t1 = ABS_EXPR (x) >>>>>>> t2 = GE_EXPR (t1, 0) >>>>>>> >>>>>>> to >>>>>>> >>>>>>> t2 = TRUE >>>>>>> >>>>>>> Surely an unsigned integer type is going to suffer the same fate? >>>>>>> Certainly I >>>>>>> can imagine somewhere in the compiler there being a fold path for: >>>>>> >>>>>> Yes but if add a cast from the unsigned type to the signed type gcc does >>>>>> not >>>>>> optimize that. If it does it is a bug since the overflow is defined >>>>>> there. >>>>> >>>>> I'm not sure I understand, are you saying I want to fold to: >>>>> >>>>> t1 = VIEW_CONVERT_EXPR (x, unsigned) >>>>> t2 = ABS_EXPR (t1) >>>>> t3 = VIEW_CONVERT_EXPR (t2, signed) >>>>> >>>>> Surely ABS_EXPR (unsigned) is a nop, and the two VIEW_CONVERTs cancel each >>>>> other out leading to an overall NOP? It might just be Friday morning and a >>>>> lack of coffee talking, but I think I need you to spell this one out to >>>>> me in big letters! >>>>> >>>> >>>> I agree. I think what you need is a type widening so that you get >>>> >>>> t1 = VEC_WIDEN (x) >>>> t2 = ABS_EXPR (t1) >>>> t3 = VEC_NARROW (t2) >>>> >>>> This then guarantees that the ABS expression cannot be undefined. I'm >>>> less sure, however about the narrow causing a change in 'sign'. Has it >>>> just punted the problem? Maybe you need >>> >>> Another option is to allow ABS_EXPR to have a TYPE_UNSIGNED >>> result type, thus do abs(int) -> unsigned (what we have as absu_hwi). >>> That is, have an ABS_EXPR that doesn't have the undefined issue >>> (at expense of optimization in case the result is immediately casted >>> back to signed) >>> >> >> Yes, that would make more sense, and is, in effect, what the ARM VABS >> instruction is doing (producing an unsigned result with no undefined >> behaviour). >> >> I'm not sure I understand your 'at expense of optimization' comment, >> though. Surely a cast back to signed is essentially a no-op, since >> there's no representational change in the value (at least, not on 2's >> complement machines)? > > We can't derive a value range of [0, INT_MAX] for the (int)ABSU_EXPR. >
Unless you're assuming that ABS_EXPR(INT_MIN) will always trap, then if you can derive it for ABS_EXPR (which really returns [0, INT_MAX]+UNSPECIFIED, I don't really see why you can't derive it for (int)ABSU_EXPR, which returns [0, INT_MAX]+INT_MIN, since the latter is a subset of the former). R. > Richard. > >> >>> Richard. >>> >>>> >>>> t1 = VEC_WIDEN (x) >>>> t2 = ABS_EXPR (t1) >>>> t3 = VIEW_CONVERT_EXPR (x, unsigned) >>>> t4 = VEC_NARROW (t3) >>>> t5 = VIEW_CONVERT_EXPR (t4, signed) >>>> >>>> !!! >>>> >>>> How you capture this into RTL during expand, though, is another thing. >>>> >>>> R. >>>> >>>>>>> >>>>>>> (unsigned >= 0) == TRUE >>>>>>> >>>>>>>>> >>>>>>>>> a = vabs_s8 (vdup_n_s8 (-128)); >>>>>>>>> assert (a >= 0); >>>>>>>>> >>>>>>>>> does not hold. As in hardware >>>>>>>>> >>>>>>>>> abs (-128) == -128 >>>>>>>>> >>>>>>>>> Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we >>>>>>>>> should avoid >>>>>>>>> it. In fact, we have to be even more careful than that, and keep the >>>>>>>>> integer >>>>>>>>> vabs intrinsics as an unspec in the back end. >>>>>>>> >>>>>>>> No it is not. The mistake is to use signed integer types here. Just >>>>>>>> add a conversion to an unsigned integer vector and it will work >>>>>>>> correctly. >>>>>>>> In fact the ABS rtl code is not undefined for the overflow. >>>>>>> >>>>>>> Here we are covering ourselves against a seperate issue. For >>>>>>> auto-vectorized >>>>>>> code we want the SABD combine patterns to kick in whenever sensible. For >>>>>>> intrinsics code, in the case where vsub_s8 (x, y) would cause an >>>>>>> underflow: >>>>>>> >>>>>>> vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y) >>>>>>> >>>>>>> So in this case, the combine would be erroneous. Likewise SABA. >>>>>> >>>>>> This sounds like it would problematic for unsigned types and not just >>>>>> for >>>>>> vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8 >>>>>> instead. Since in rtl overflow and underflow is defined to be wrapping. >>>>> >>>>> There are no vabs_u8/vabd_u8 so I don't see how we can reach this point >>>>> with unsigned types. Further, I have never thought of RTL having signed >>>>> and unsigned types, just a bag of bits. We'll want to use unspec for the >>>>> intrinsic version of vabd_s8 - but we'll want to specify the >>>>> >>>>> (abs (minus (reg) (reg))) >>>>> >>>>> behaviour so that auto-vectorized code can pick it up. >>>>> >>>>> So in the end we'll have these patterns: >>>>> >>>>> (abs >>>>> (abs (reg))) >>>>> >>>>> (intrinsic_abs >>>>> (unspec [(reg)] UNSPEC_ABS)) >>>>> >>>>> (abd >>>>> (abs (minus (reg) (reg)))) >>>>> >>>>> (intrinsic_abd >>>>> (unspec [(reg) (reg)] UNSPEC_ABD)) >>>>> >>>>> (aba >>>>> (plus (abs (minus (reg) (reg))) (reg))) >>>>> >>>>> (intrinsic_aba >>>>> (plus (unspec [(reg) (reg)] UNSPEC_ABD) (reg))) >>>>> >>>>> which should give us reasonable auto-vectorized code without triggering >>>>> any >>>>> of the issues mapping the semantics of the instructions to intrinsics. >>>>> >>>>> Thanks, >>>>> James >>>>> >>>>>> >>>>>> Thanks, >>>>>> Andrew Pinski >>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> James >>>>>> >>>>> >>>> >>>> >>> >> >> >