> The patch tries to use REG_EQUAL to get more precise info for nonzero_bits, > which helps to remove unnecessary zero_extend. > > Here is an example when compiling Coremark, we have rtx like, > > (insn 1244 386 388 47 (set (reg:SI 263 [ D.5767 ]) > (reg:SI 384 [ D.5767 ])) 786 {*thumb2_movsi_insn} > (expr_list:REG_EQUAL (zero_extend:SI (mem:QI (reg/v/f:SI 271 [ memblock > ]) [0 *memblock_13(D)+0 S1 A8])) > (nil))) > > from "reg:SI 384", we can only know it is a 32-bit value. But from > REG_EQUAL, we can know it is an 8-bit value. Then for the following rtx seq, > > (insn 409 407 410 50 (set (reg:SI 308) > (plus:SI (reg:SI 263 [ D.5767 ]) > (const_int -48 [0xffffffffffffffd0]))) core_state.c:170 4 > {*arm_addsi3} > (nil)) > (insn 410 409 411 50 (set (reg:SI 309) > (zero_extend:SI (subreg:QI (reg:SI 308) 0))) core_state.c:170 812 > {thumb2_zero_extendqisi2_v6} > (expr_list:REG_DEAD (reg:SI 308) > (nil))) > > the zero_extend for r309 can be optimized by combine pass.
This sounds like a good idea. > 2014-11-21 Zhenqiang Chen <zhenqiang.c...@arm.com> > > * combine.c (set_nonzero_bits_and_sign_copies): Try REG_EQUAL note. > > diff --git a/gcc/combine.c b/gcc/combine.c > index 6a7d16b..68a883b 100644 > --- a/gcc/combine.c > +++ b/gcc/combine.c > @@ -1713,7 +1713,15 @@ set_nonzero_bits_and_sign_copies (rtx x, const_rtx > set, void *data) > > /* Don't call nonzero_bits if it cannot change anything. */ > if (rsp->nonzero_bits != ~(unsigned HOST_WIDE_INT) 0) > - rsp->nonzero_bits |= nonzero_bits (src, nonzero_bits_mode); > + { > + rtx reg_equal = insn ? find_reg_note (insn, REG_EQUAL, > NULL_RTX) > + : NULL_RTX; > + if (reg_equal) > + rsp->nonzero_bits |= nonzero_bits (XEXP (reg_equal, 0), > + nonzero_bits_mode); > + else > + rsp->nonzero_bits |= nonzero_bits (src, nonzero_bits_mode); > + } > num = num_sign_bit_copies (SET_SRC (set), GET_MODE (x)); > if (rsp->sign_bit_copies == 0 > > || rsp->sign_bit_copies > num) Use find_reg_equal_equiv_note instead. Are you sure that this won't yield inferior results in very peculiar cases? IOW, why not use both sources? Note that 'src' is massaged just above if SHORT_IMMEDIATES_SIGN_EXTEND is defined so we should probably do it for the note datum too, for example by factoring the code into a function and invoking it. Why not do the same for num_sign_bit_copies? -- Eric Botcazou