On Mon, 2005-02-21 at 13:26 -0500, Kazu Hirata wrote: > Hi Jeff, > > > Fixing cse.c to not use the accessor macros for REG_IN_TABLE, REG_TICK > > and SUBREG_TICKED saves about 1% compilation time for the components > > of cc1. Yes, that's a net 1% improvement by dropping the abstraction > > layer. > > Yes, I've noticed the problem. In my defense, the code in question > was even worse before I touched it. :-) With the old code, every time > we access a cse_reg_info entry that is different from the last access, > we were generating a function call. Nowadays, we avoid calls to > get_cse_reg_info_1 95% of the time. I'm not assigning any blame :-)
> Of course, it's tough to beat the performance of your explicit > initialization approach, but here are couple of things that I have > thought about while keeping some abstraction layer. > > The first thought is to expose the timestamp update to the user of > those macros that you mentioned. > > /* Find a cse_reg_info entry for REGNO. */ > > static inline struct cse_reg_info * > get_cse_reg_info (unsigned int regno) > { > struct cse_reg_info *p = &cse_reg_info_table[regno]; > > /* If this entry has not been initialized, go ahead and initialize > it. */ > if (p->timestamp != cse_reg_info_timestamp) > { > get_cse_reg_info_1 (regno); > p->timestamp = cse_reg_info_timestamp; /* <- Look! */ > } > > return p; > } > > This way, DOM may be able to do jump threading to some extent and > remove a lot of the timestamp checks. Of couse, jump threading > opportunities are blocked when we have a non-pure/const function call > like so: Nope. That won't solve the problem, even within a conditional like REG_IN_TABLE (i) >= 0 && REG_IN_TABLE (i) != REG_TICK (i)) You can effectively get this code by changing the order of statements in cse_get_reg_info_1 and inlining it. The problem is we're going to have two codepaths (one where we initialize, one where we do not) which merge so that we can test that in_table >= 0. ie: # BLOCK 6 # PRED: 21 [89.0%] (dfs_back,true,exec) 5 [100.0%] (fallthru) # ivtmp.206_69 = PHI <ivtmp.206_68(21), ivtmp.206_4(5)>; # ivtmp.204_820 = PHI <ivtmp.204_819(21), ivtmp.204_70(5)>; # i_15 = PHI <i_560(21), regno_518(5)>; <L6>:; cse_reg_info_table.98_525 = cse_reg_info_table; p_526 = cse_reg_info_table.98_525 + ivtmp.204_820; D.20578_527 = p_526->timestamp; cse_reg_info_timestamp.99_528 = cse_reg_info_timestamp; if (D.20578_527 != cse_reg_info_timestamp.99_528) goto <L7>; else goto <L8>; # SUCC: 7 [71.0%] (true,exec) 8 [29.0%] (false,exec) # BLOCK 7 # PRED: 6 [71.0%] (true,exec) <L7>:; p_526->reg_tick = 1; p_526->reg_in_table = -1; p_526->subreg_ticked = 0ffffffff; D.20589_121 = (int) ivtmp.206_69; p_526->reg_qty = D.20589_121; cse_reg_info_timestamp.97_733 = cse_reg_info_timestamp; p_526->timestamp = cse_reg_info_timestamp.97_733; # SUCC: 8 [100.0%] (fallthru,exec) # BLOCK 8 # PRED: 6 [29.0%] (false,exec) 7 [100.0%] (fallthru,exec) <L8>:; D.20419_531 = p_526->reg_in_table; if (D.20419_531 >= 0) goto <L10>; else goto <L18>; # SUCC: 9 [79.0%] (true,exec) 15 [21.0%] (false,exec) # BLOCK 9 # PRED: 8 [79.0%] (true,exec) <L10>:; D.20597_641 = p_526->timestamp; cse_reg_info_timestamp.99_642 = cse_reg_info_timestamp; if (D.20597_641 != cse_reg_info_timestamp.99_642) goto <L11>; else goto <L12>; # SUCC: 10 [71.0%] (true,exec) 11 [29.0%] (false,exec) You'll see the assignment to cse_reg_info[regno]->timestamp in block #7. Then the merge point in block #8 to test the value of REG_IN_TABLE. We then check the timestamp again in block #9. block #7 is not a direct predecessor or block #9, so jump threading isn't going to help. Jeff