On Tue, Sep 26, 2017 at 2:44 PM, Sudi Das <sudi....@arm.com> wrote:
>
> Still waiting on Jakub's comment on whether there are more things needed at 
> the backend. But I have updated the patch according to Richard's comments.

+   (if (TYPE_UNSIGNED(type))

space before '(type)'.

+     (rshift (lshift @0 @2) @3)
+   (with
+    { tree utype = unsigned_type_for (type); }
+    (convert (rshift (lshift (convert:utype @0) @2) @3))))))

I think your testcase needs a

{ dg-require-effective-target int32plus }

as otherwise it'll fail on AVR.

I think the patch is ok with these changes but obviously we should try
to address
the code-generation issue on x86 at RTL expansion time.  They are sort-of
existing missing optimizations.

Richard.

> Thanks
> Sudi
>
>
>
> From: Richard Biener <richard.guent...@gmail.com>
> Sent: Friday, August 4, 2017 11:16 AM
> To: Sudi Das
> Cc: Wilco Dijkstra; Jakub Jelinek; GCC Patches; nd; Richard Earnshaw; James 
> Greenhalgh
> Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x)
>
> On Tue, Aug 1, 2017 at 11:14 AM, Sudi Das <sudi....@arm.com> wrote:
>>
>>
>>
>>
>> Sorry about the delayed response but looking at the above discussion, should 
>> I conclude that this is a valid tree simplification?
>
> Yes, I think so.  Jakub requested code to undo this at RTL expansion
> based on target costs, not sure if we really should
> require that from you given the user could have written the target
> sequence himself.
>
> Few comments about the patch:
>
> +/* Fold (1 << (C - x)) where C = precision(type) - 1
> +   into ((1 << C) >> x). */
> +(simplify
> + (lshift integer_onep@0 (minus INTEGER_CST@1 @2))
>
> I think this warrants a single_use check on the minus (note :s isn't enough
> as with the unsigned case we'd happily ignore it by design).
>
> +  (if (INTEGRAL_TYPE_P (type)
> +       && TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT
> +       && tree_to_uhwi (@1) == (unsigned)(TYPE_PRECISION (type) - 1))
>
> You can relax this with using
>
>           && wi::eq_p (@1, TYPE_PRECISION (type) - 1)
>
> +   (if (TYPE_UNSIGNED(type))
> +     (rshift (lshift @0 @1) @2)
> +   (with
> +    { tree utype = unsigned_type_for (type); }
> +    (convert:type (rshift (lshift (convert:utype @0) @1) @2))))))
> +
>
> You can write (convert (rshift ...)), without the :type.
>
> I'm leaving it to Jakub whether you need to write that RTL expansion tweak.
>
> Thanks,
> Richard.
>
>> I am pasting the diff of the assembly that AArch64 generates with the test 
>> case that I added. I see fewer instructions generated with the patch.
>>
>> --- pr80131-1.s    2017-08-01 10:02:43.243374174 +0100
>> +++ pr80131-1.s-patched    2017-08-01 10:00:54.776455630 +0100
>> @@ -24,10 +24,8 @@
>>      str    x0, [sp, 8]
>>      ldr    x0, [sp, 8]
>>      mov    w1, w0
>> -    mov    w0, 63
>> -    sub    w0, w0, w1
>> -    mov    x1, 1
>> -    lsl    x0, x1, x0
>> +    mov    x0, -9223372036854775808
>> +    lsr    x0, x0, x1
>>      add    sp, sp, 16
>>      ret
>>      .size    f2, .-f2
>> @@ -39,10 +37,8 @@
>>      str    x0, [sp, 8]
>>      ldr    x0, [sp, 8]
>>      mov    w1, w0
>> -    mov    w0, 63
>> -    sub    w0, w0, w1
>> -    mov    x1, 1
>> -    lsl    x0, x1, x0
>> +    mov    x0, -9223372036854775808
>> +    lsr    x0, x0, x1
>>      add    sp, sp, 16
>>      ret
>>      .size    f3, .-f3
>> @@ -52,11 +48,9 @@
>>  f4:
>>      sub    sp, sp, #16
>>      str    w0, [sp, 12]
>> -    mov    w1, 31
>>      ldr    w0, [sp, 12]
>> -    sub    w0, w1, w0
>> -    mov    w1, 1
>> -    lsl    w0, w1, w0
>> +    mov    w1, -2147483648
>> +    lsr    w0, w1, w0
>>      add    sp, sp, 16
>>      ret
>>      .size    f4, .-f4
>>
>>
>> Thanks
>>
>> Sudi
>>
>>
>>
>>
>> From: Wilco Dijkstra
>> Sent: Thursday, April 13, 2017 1:01 PM
>> To: Richard Biener; Jakub Jelinek
>> Cc: Sudi Das; GCC Patches; nd; Richard Earnshaw; James Greenhalgh
>> Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x)
>>
>> Richard Biener wrote:
>>> It is IMHO a valid GIMPLE optimization / canonicalization.
>>>
>>>        movabsq $-9223372036854775808, %rax
>>>
>>> so this should then have been generated as 1<<63?
>>>
>>> At some point variable shifts were quite expensive as well..
>>
>> Yes I don't see a major difference between movabsq and
>>
>>         movl    $1, %eax
>>         salq    $63, %rax
>>
>> on my Sandy Bridge, but if the above is faster then that is what the x64
>> backend should emit - it's 1 byte smaller as well, so probably better in all
>> cases.
>>
>> Wilco
>

Reply via email to