------- Additional Comments From steven at gcc dot gnu dot org 2005-03-06
22:14 -------
Just to give people an idea of how close we are to optimizing well enough that
the calls to fold_rtx in CSE are almost all no-ops, here are some numbers
taken over all cc1-i files on amd64:
Number of times fold_rtx is called: 13882333
Number of times it returns something other than the incoming rtx x: 70001
Number of times fold_rtx is called by other functions than itself: 9323647
Number of times it returns something other than x: 8526
A few rtxes that fold_rtx handles:
Loads from constant pool:
Trying to fold rtx:
(float_extend:DF (mem/u/i:SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [2 S4
A32]))
Trying to fold rtx:
(mem/u/i:SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [2 S4 A32])
Trying to fold rtx:
(symbol_ref/u:DI ("*.LC0") [flags 0x2])
Returning X unchanged.
Returning new rtx:
(const_double:SF 1.0e+0 [0x0.8p+1])
Returning new rtx:
(const_double:DF 1.0e+0 [0x0.8p+1])
Folded jumps:
Trying to fold rtx:
(if_then_else (eq (reg:CCZ 17 flags)
(const_int 0 [0x0]))
(label_ref 73)
(pc))
Trying to fold rtx:
(pc)
Returning X unchanged.
Trying to fold rtx:
(eq (reg:CCZ 17 flags)
(const_int 0 [0x0]))
Trying to fold rtx:
(reg:SI 66 [ D.10402 ])
Returning X unchanged.
Trying to fold rtx:
(const_int 4 [0x4])
Returning X unchanged.
Returning new rtx:
(const_int 1 [0x1])
Returning new rtx:
(label_ref 73)
Apparently an equivalent expression with lower cost:
Trying to fold rtx:
(plus:QI (subreg:QI (reg:SI 251) 0)
(subreg:QI (reg:SI 251) 0))
Trying to fold rtx:
(subreg:QI (reg:SI 251) 0)
Trying to fold rtx:
(reg:SI 251)
Returning X unchanged.
Returning X unchanged.
Trying to fold rtx:
(subreg:QI (reg:SI 251) 0)
Trying to fold rtx:
(reg:SI 251)
Returning X unchanged.
Returning X unchanged.
Returning new rtx:
(ashift:QI (subreg:QI (reg:SI 251) 0)
(const_int 1 [0x1]))
Likewise:
Trying to fold rtx:
(mult:DI (reg:DI 63 [ <variable>.comb_vect.length ])
(const_int 4 [0x4]))
Returning new rtx:
(ashift:DI (reg:DI 63 [ <variable>.comb_vect.length ])
(const_int 2 [0x2]))
It'd be interesting to find out how many of these things combine and later CSE
passes would catch (or miss), and the tree-cleanup-branch compares. I will
look at the latter first.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721