------- Additional Comments From steven at gcc dot gnu dot org 2005-03-06 22:14 ------- Just to give people an idea of how close we are to optimizing well enough that the calls to fold_rtx in CSE are almost all no-ops, here are some numbers taken over all cc1-i files on amd64: Number of times fold_rtx is called: 13882333 Number of times it returns something other than the incoming rtx x: 70001 Number of times fold_rtx is called by other functions than itself: 9323647 Number of times it returns something other than x: 8526 A few rtxes that fold_rtx handles: Loads from constant pool: Trying to fold rtx: (float_extend:DF (mem/u/i:SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [2 S4 A32])) Trying to fold rtx: (mem/u/i:SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [2 S4 A32]) Trying to fold rtx: (symbol_ref/u:DI ("*.LC0") [flags 0x2]) Returning X unchanged. Returning new rtx: (const_double:SF 1.0e+0 [0x0.8p+1]) Returning new rtx: (const_double:DF 1.0e+0 [0x0.8p+1]) Folded jumps: Trying to fold rtx: (if_then_else (eq (reg:CCZ 17 flags) (const_int 0 [0x0])) (label_ref 73) (pc)) Trying to fold rtx: (pc) Returning X unchanged. Trying to fold rtx: (eq (reg:CCZ 17 flags) (const_int 0 [0x0])) Trying to fold rtx: (reg:SI 66 [ D.10402 ]) Returning X unchanged. Trying to fold rtx: (const_int 4 [0x4]) Returning X unchanged. Returning new rtx: (const_int 1 [0x1]) Returning new rtx: (label_ref 73) Apparently an equivalent expression with lower cost: Trying to fold rtx: (plus:QI (subreg:QI (reg:SI 251) 0) (subreg:QI (reg:SI 251) 0)) Trying to fold rtx: (subreg:QI (reg:SI 251) 0) Trying to fold rtx: (reg:SI 251) Returning X unchanged. Returning X unchanged. Trying to fold rtx: (subreg:QI (reg:SI 251) 0) Trying to fold rtx: (reg:SI 251) Returning X unchanged. Returning X unchanged. Returning new rtx: (ashift:QI (subreg:QI (reg:SI 251) 0) (const_int 1 [0x1])) Likewise: Trying to fold rtx: (mult:DI (reg:DI 63 [ <variable>.comb_vect.length ]) (const_int 4 [0x4])) Returning new rtx: (ashift:DI (reg:DI 63 [ <variable>.comb_vect.length ]) (const_int 2 [0x2])) It'd be interesting to find out how many of these things combine and later CSE passes would catch (or miss), and the tree-cleanup-branch compares. I will look at the latter first.
-- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721