https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91523
Bug ID: 91523 Summary: Register allocation picks sub-optimal alternative with scratch registers Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: missed-optimization, ra Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rearnsha at gcc dot gnu.org CC: vmakarov at redhat dot com, wdijkstr at arm dot com Target Milestone: --- Target: arm-none-eabi consider this simple testcase: _Bool f (unsigned a, unsigned b) { return a | b; } when compiled for arm-eabi with options -O2 -march=armv7-a -mthumb, the compiler generates the following code sequence: orrs r3, r0, r1 @ 8 [c=4 l=4] *iorsi3_compare0_scratch/2 ite ne movne r0, #1 @ 24 [c=8 l=6] *p *thumb2_movsi_insn/1 moveq r0, #0 @ 25 [c=8 l=6] *p *thumb2_movsi_insn/1 bx lr @ 28 [c=8 l=4] *thumb2_return which is correct, but sub-optimal. The pattern selected for the orrs has picked a sub-optimal alternative. The pattern being matched is (define_insn "*iorsi3_compare0_scratch" [(set (reg:CC_NOOV CC_REGNUM) (compare:CC_NOOV (ior:SI (match_operand:SI 1 "s_register_operand" "%r,0,r") (match_operand:SI 2 "arm_rhs_operand" "I,l,r")) (const_int 0))) (clobber (match_scratch:SI 0 "=r,l,r"))] "TARGET_32BIT" "orrs%?\\t%0, %1, %2" [(set_attr "conds" "set") (set_attr "arch" "*,t2,*") (set_attr "length" "4,2,4") (set_attr "type" "logics_imm,logics_reg,logics_reg")] ) and the insn matching it is (insn 7 21 22 2 (parallel [ (set (reg:CC_NOOV 100 cc) (compare:CC_NOOV (ior:SI (reg:SI 119) (reg:SI 120)) (const_int 0 [0]))) (clobber (scratch:SI)) ]) "/home/rearnsha/work/pdtools/gcc-tests/cmpdi.c":3:35 96 {*iorsi3_compare0_scratch} (expr_list:REG_DEAD (reg:SI 120) (expr_list:REG_DEAD (reg:SI 119) (nil)))) Given that both input operands are dead, there is no reason why the compiler can't pick alternative 1 from the insn and use a scratch that clobbers one of the inputs (either directly, or via a commutative swap). However it insists on allocating a different register (r3) and then using alternative 2 which is more expensive (32-bit insn instead of 16-bit insn). Note the pattern shown above is updated in r274822 to add the thumb2 alternative that we want to match here.