On Tue, Sep 26, 2017 at 2:44 PM, Sudi Das <sudi....@arm.com> wrote: > > Still waiting on Jakub's comment on whether there are more things needed at > the backend. But I have updated the patch according to Richard's comments.
+ (if (TYPE_UNSIGNED(type)) space before '(type)'. + (rshift (lshift @0 @2) @3) + (with + { tree utype = unsigned_type_for (type); } + (convert (rshift (lshift (convert:utype @0) @2) @3)))))) I think your testcase needs a { dg-require-effective-target int32plus } as otherwise it'll fail on AVR. I think the patch is ok with these changes but obviously we should try to address the code-generation issue on x86 at RTL expansion time. They are sort-of existing missing optimizations. Richard. > Thanks > Sudi > > > > From: Richard Biener <richard.guent...@gmail.com> > Sent: Friday, August 4, 2017 11:16 AM > To: Sudi Das > Cc: Wilco Dijkstra; Jakub Jelinek; GCC Patches; nd; Richard Earnshaw; James > Greenhalgh > Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x) > > On Tue, Aug 1, 2017 at 11:14 AM, Sudi Das <sudi....@arm.com> wrote: >> >> >> >> >> Sorry about the delayed response but looking at the above discussion, should >> I conclude that this is a valid tree simplification? > > Yes, I think so. Jakub requested code to undo this at RTL expansion > based on target costs, not sure if we really should > require that from you given the user could have written the target > sequence himself. > > Few comments about the patch: > > +/* Fold (1 << (C - x)) where C = precision(type) - 1 > + into ((1 << C) >> x). */ > +(simplify > + (lshift integer_onep@0 (minus INTEGER_CST@1 @2)) > > I think this warrants a single_use check on the minus (note :s isn't enough > as with the unsigned case we'd happily ignore it by design). > > + (if (INTEGRAL_TYPE_P (type) > + && TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT > + && tree_to_uhwi (@1) == (unsigned)(TYPE_PRECISION (type) - 1)) > > You can relax this with using > > && wi::eq_p (@1, TYPE_PRECISION (type) - 1) > > + (if (TYPE_UNSIGNED(type)) > + (rshift (lshift @0 @1) @2) > + (with > + { tree utype = unsigned_type_for (type); } > + (convert:type (rshift (lshift (convert:utype @0) @1) @2)))))) > + > > You can write (convert (rshift ...)), without the :type. > > I'm leaving it to Jakub whether you need to write that RTL expansion tweak. > > Thanks, > Richard. > >> I am pasting the diff of the assembly that AArch64 generates with the test >> case that I added. I see fewer instructions generated with the patch. >> >> --- pr80131-1.s 2017-08-01 10:02:43.243374174 +0100 >> +++ pr80131-1.s-patched 2017-08-01 10:00:54.776455630 +0100 >> @@ -24,10 +24,8 @@ >> str x0, [sp, 8] >> ldr x0, [sp, 8] >> mov w1, w0 >> - mov w0, 63 >> - sub w0, w0, w1 >> - mov x1, 1 >> - lsl x0, x1, x0 >> + mov x0, -9223372036854775808 >> + lsr x0, x0, x1 >> add sp, sp, 16 >> ret >> .size f2, .-f2 >> @@ -39,10 +37,8 @@ >> str x0, [sp, 8] >> ldr x0, [sp, 8] >> mov w1, w0 >> - mov w0, 63 >> - sub w0, w0, w1 >> - mov x1, 1 >> - lsl x0, x1, x0 >> + mov x0, -9223372036854775808 >> + lsr x0, x0, x1 >> add sp, sp, 16 >> ret >> .size f3, .-f3 >> @@ -52,11 +48,9 @@ >> f4: >> sub sp, sp, #16 >> str w0, [sp, 12] >> - mov w1, 31 >> ldr w0, [sp, 12] >> - sub w0, w1, w0 >> - mov w1, 1 >> - lsl w0, w1, w0 >> + mov w1, -2147483648 >> + lsr w0, w1, w0 >> add sp, sp, 16 >> ret >> .size f4, .-f4 >> >> >> Thanks >> >> Sudi >> >> >> >> >> From: Wilco Dijkstra >> Sent: Thursday, April 13, 2017 1:01 PM >> To: Richard Biener; Jakub Jelinek >> Cc: Sudi Das; GCC Patches; nd; Richard Earnshaw; James Greenhalgh >> Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x) >> >> Richard Biener wrote: >>> It is IMHO a valid GIMPLE optimization / canonicalization. >>> >>> movabsq $-9223372036854775808, %rax >>> >>> so this should then have been generated as 1<<63? >>> >>> At some point variable shifts were quite expensive as well.. >> >> Yes I don't see a major difference between movabsq and >> >> movl $1, %eax >> salq $63, %rax >> >> on my Sandy Bridge, but if the above is faster then that is what the x64 >> backend should emit - it's 1 byte smaller as well, so probably better in all >> cases. >> >> Wilco >