On 26 May 2016 at 11:53, Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> wrote: > Hi all, > > In this PR we want to optimise: > int foo (int i) > { > return (i == 0) ? N : __builtin_clz (i); > } > > on targets where CLZ is defined at zero to the constant 'N'. > This is determined at the RTL level through the CLZ_DEFINED_VALUE_AT_ZERO > macro. > The obvious place to implement this would be in combine through simplify-rtx > where we'd > recognise an IF_THEN_ELSE of the form: > (set (reg:SI r1) > (if_then_else:SI (ne (reg:SI r2) > (const_int 0 [0])) > (clz:SI (reg:SI r2)) > (const_int 32))) > > and if CLZ_DEFINED_VALUE_AT_ZERO is defined to 32 for SImode we'd simplify > it into > just (clz:SI (reg:SI r2)). > However, I found this doesn't quite happen for a couple of reasons: > 1) This depends on ifcvt or some other pass to have created a conditional > move of the > two branches that provide the IF_THEN_ELSE to propagate the const_int and > clz operation into. > > 2) Combine will refuse to propagate r2 from the above example into both the > condition and the > CLZ at the same time, so the most we see is: > (set (reg:SI r1) > (if_then_else:SI (ne (reg:CC cc) > (const_int 0)) > (clz:SI (reg:SI r2)) > (const_int 32))) > > which is not enough information to perform the simplification. > > This patch implements the optimisation in ce1 using the noce ifcvt > framework. > During ifcvt noce_process_if_block can see that we're trying to optimise > something > of the form (x == 0 ? const_int : CLZ (x)) and so it has visibility of all > the information > needed to perform the transformation. > > The transformation is performed by adding a new noce_try* function that > tries to put the > condition and the 'then' and 'else' arms into an IF_THEN_ELSE rtx and try to > simplify that > using the simplify-rtx machinery. That way, we can implement the > simplification logic in > simplify-rtx.c where it belongs. > > A similar transformation for CTZ is implemented as well. > So for code: > int foo (int i) > { > return (i == 0) ? 32 : __builtin_clz (i); > } > > On aarch64 we now emit: > foo: > clz w0, w0 > ret > > instead of: > foo: > mov w1, 32 > clz w2, w0 > cmp w0, 0 > csel w0, w2, w1, ne > ret > > and for arm similarly we generate: > foo: > clz r0, r0 > bx lr > > instead of: > foo: > cmp r0, #0 > clzne r0, r0 > moveq r0, #32 > bx lr > > > and for x86_64 with -O2 -mlzcnt we generate: > foo: > xorl %eax, %eax > lzcntl %edi, %eax > ret > > instead of: > foo: > xorl %eax, %eax > movl $32, %edx > lzcntl %edi, %eax > testl %edi, %edi > cmove %edx, %eax > ret > > > I tried getting this to work on other targets as well, but encountered > difficulties. > For example on powerpc the two arms of the condition seen during ifcvt are: > > (insn 4 22 11 4 (set (reg:DI 156 [ <retval> ]) > (const_int 32 [0x20])) clz.c:3 434 {*movdi_internal64} > (nil)) > and > (insn 10 9 23 3 (set (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0) > (clz:SI (subreg/u:SI (reg/v:DI 157 [ i ]) 0))) clz.c:3 132 {clzsi2} > (expr_list:REG_DEAD (reg/v:DI 157 [ i ]) > (nil))) > > So the setup code in noce_process_if_block sees that the set destination is > not the same > ((reg:DI 156 [ <retval> ]) and (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0)) > so it bails out on the rtx_interchangeable_p (x, SET_DEST (set_b)) check. > I suppose that's a consequence of how SImode operations are represented in > early RTL > on powerpc, I don't know what to do there. Perhaps that part of ivcvt can be > taught to handle > destinations that are subregs of one another, but that would be a separate > patch. > > Anyway, is this patch ok for trunk? > > Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu, > x86_64-pc-linux-gnu. > > Thanks, > Kyrill > > 2016-05-26 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > PR middle-end/37780 > * ifcvt.c (noce_try_ifelse_collapse): New function. > Declare prototype. > (noce_process_if_block): Call noce_try_ifelse_collapse. > * simplify-rtx.c (simplify_cond_clz_ctz): New function. > (simplify_ternary_operation): Use the above to simplify > conditional CLZ/CTZ expressions. > > 2016-05-26 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > PR middle-end/37780 > * gcc.c-torture/execute/pr37780.c: New test. > * gcc.target/aarch64/pr37780_1.c: Likewise. > * gcc.target/arm/pr37780_1.c: Likewise.
Hi Kyrylo, I've noticed that gcc.target/arm/pr37780_1.c fails on arm if arch < v6. I first tried to fix the effective-target guard (IIRC, the doc says clz is available starting with v5t), but that isn't sufficient. When compiling for armv5-t, the scan-assembler directives fail. It seems to work with v6t2, so I am wondering whether it's just a matter of increasing the effective-target arch version, or if you really intended to make the test pass on these old architectures? Thanks, Christophe.