https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67288
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jakub at gcc dot gnu.org --- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Started with my r204516 change aka https://gcc.gnu.org/ml/gcc-patches/2013-11/msg00768.html The changes in *.optimized dump look reasonable, fewer IVs: <bb 3>: + # RANGE [16, 4294967280] + _18 = _22 * 16; + # RANGE [0, 4294967295] + _9 = _18 + _6; + _3 = (void *) _9; <bb 4>: # PT = nonlocal # ALIGN = 16, MISALIGN = 0 # addr_23 = PHI <addr_15(5), addr_7(3)> - # RANGE [0, 4294967295] NONZERO 0x0000000000fffffff - # i_24 = PHI <i_14(5), 0(3)> __asm__ __volatile__("dcbf 0, %0" : : "r" addr_23 : "memory"); - # RANGE [0, 4294967295] NONZERO 0x0000000000fffffff - i_14 = i_24 + 1; # PT = nonlocal # ALIGN = 16, MISALIGN = 0 addr_15 = addr_23 + 16; - if (i_14 != _22) + if (addr_15 != _3) The * 16 is present in GIMPLE, but the right shift by 4 / division by 16 is something added in the doloop pass later on, so this doesn't seem to be something that can be optimized in GIMPLE. On the trunk, we have in *.optimized: # RANGE [0, 268435455] NONZERO 268435455 _20 = size_12 >> 4; if (_20 != 0) goto <bb 3>; [89.00%] else goto <bb 6>; [11.00%] <bb 3> [local count: 105119325]: # RANGE [16, 4294967280] _6 = _20 * 16; _4 = _1 + _6; _3 = (void *) _4; <bb 4> [local count: 955630224]: # PT = nonlocal null # ALIGN = 16, MISALIGN = 0 # addr_21 = PHI <addr_10(3), addr_15(4)> __asm__ __volatile__("dcbf 0, %0" : : "r" addr_21 : "memory"); # PT = nonlocal null # ALIGN = 16, MISALIGN = 0 addr_15 = addr_21 + 16; if (_3 != addr_15) goto <bb 4>; [89.00%] else goto <bb 5>; [11.00%] To optimize this on RTL we'd need to have accurate value range info, otherwise optimizing (unsigned) (x << 4 + (cst << 4)) >> 4 into x + cst is not valid. Before *.combine we have: (insn 16 15 17 3 (set (reg:SI 135) (ashift:SI (reg:SI 128 [ _20 ]) (const_int 4 [0x4]))) 269 {ashlsi3} (expr_list:REG_DEAD (reg:SI 128 [ _20 ]) (nil))) (insn 17 16 42 3 (set (reg/f:SI 122 [ _3 ]) (plus:SI (reg:SI 135) (reg/v/f:SI 127 [ addr ]))) 72 {*addsi3} (expr_list:REG_DEAD (reg:SI 135) (nil))) (insn 42 17 43 3 (set (reg:SI 138) (minus:SI (reg/f:SI 122 [ _3 ]) (reg/v/f:SI 127 [ addr ]))) -1 (expr_list:REG_DEAD (reg/f:SI 122 [ _3 ]) (nil))) (insn 43 42 44 3 (set (reg:SI 139) (plus:SI (reg:SI 138) (const_int -16 [0xfffffffffffffff0]))) -1 (expr_list:REG_DEAD (reg:SI 138) (nil))) (insn 44 43 45 3 (set (reg:SI 140) (lshiftrt:SI (reg:SI 139) (const_int 4 [0x4]))) -1 (expr_list:REG_DEAD (reg:SI 139) (nil))) and only combine turns that into: (insn 42 17 43 3 (set (reg:SI 138) (ashift:SI (reg:SI 128 [ _20 ]) (const_int 4 [0x4]))) 269 {ashlsi3} (expr_list:REG_DEAD (reg:SI 128 [ _20 ]) (nil))) (insn 43 42 44 3 (set (reg:SI 139) (plus:SI (reg:SI 138) (const_int -16 [0xfffffffffffffff0]))) 72 {*addsi3} (expr_list:REG_DEAD (reg:SI 138) (nil))) (insn 44 43 45 3 (set (reg:SI 140) (lshiftrt:SI (reg:SI 139) (const_int 4 [0x4]))) 279 {lshrsi3} (expr_list:REG_DEAD (reg:SI 139) (nil))) (insn 45 44 21 3 (set (reg:SI 137) (plus:SI (reg:SI 140) (const_int 1 [0x1]))) 72 {*addsi3} (expr_list:REG_DEAD (reg:SI 140) (nil))) While reg:SI 128 is set only once and thus in theory we could in theory use in RTL the corresponding GIMPLE value ranges for that SSA_NAME, the SSA_NAME for _20 is [0, 268435455] and thus only allows us to figure out that << 4 will not shift away any bits out of it (i.e. that (r128 << 4) >> 4 is equal to r128. We need to know that it can't be zero as well, which on GIMPLE is present in the value range of _6 - [16, 4294967280], but unfortunately that info is lost during TER, pseudo 135 doesn't really have REG_EXPR set. Maybe that would be fixable by some expander work. Then the question is if we actually can use GIMPLE VRP info during RTL optimizations, and whether all optimizations on RTL that would invalidate that reset REG_EXPR or could in some other way signal that the VRP info can't be trusted. The combiner first optimizes: Trying 17 -> 42: 17: r122:SI=r135:SI+r127:SI REG_DEAD r135:SI 42: r138:SI=r122:SI-r127:SI REG_DEAD r122:SI Successfully matched this instruction: (set (reg:SI 138) (reg:SI 135)) and then: Trying 16 -> 42: 16: r135:SI=r128:SI<<0x4 REG_DEAD r128:SI 42: r138:SI=r135:SI REG_DEAD r135:SI Successfully matched this instruction: (set (reg:SI 138) (ashift:SI (reg:SI 128 [ _20 ]) (const_int 4 [0x4]))) so if we had VRP info on _6 aka (reg:SI 135 [ _6 ]) we'd need to signal that it is the same on r138. And then have some way to query it and somewhere optimize.