Sigh. I can't wait for this code to become less critical, both in terms of runtime and compile time performance. There are so many things in cse.c that are just plain bad....
cse.c has a fair amount of complexity in its hash table management code to avoid spending too much time invalidating/removing items in the hash table when a value of an expression changes. Specifically we have REG_TICK, REG_IN_TABLE and other fields to track when entries are valid. It's fairly common to have code which looks something like this (from mention_regs): for (i = regno; i < endregno; i++) { if (REG_IN_TABLE (i) >= 0 && REG_IN_TABLE (i) != REG_TICK (i)) remove_invalid_refs (i); REG_IN_TABLE (i) = REG_TICK (i); SUBREG_TICKED (i) = -1; } Looks pretty simple and straightforward. Let's dig a little deeper into those accessor macros... /* Get the number of times this register has been updated in this basic block. */ #define REG_TICK(N) (get_cse_reg_info (N)->reg_tick) /* Get the point at which REG was recorded in the table. */ #define REG_IN_TABLE(N) (get_cse_reg_info (N)->reg_in_table) /* Get the SUBREG set at the last increment to REG_TICK (-1 if not a SUBREG). */ #define SUBREG_TICKED(N) (get_cse_reg_info (N)->subreg_ticked) Hmmm, it's starting to get interesting -- each macro has a function call to get_cse_reg_info, which looks like: static inline struct cse_reg_info * get_cse_reg_info (unsigned int regno) { struct cse_reg_info *p = &cse_reg_info_table[regno]; /* If this entry has not been initialized, go ahead and initialize it. */ if (p->timestamp != cse_reg_info_timestamp) get_cse_reg_info_1 (regno); return p; } OUCH OUCH OUCH. Yes, this is inlined, but that's of little/no value with the function call to get_cse_reg_info_1. Let me show why by looking at the tree dumps for the conditional above: if (REG_IN_TABLE (i) >= 0 && REG_IN_TABLE (i) != REG_TICK (i)) remove_invalid_refs (i); # BLOCK 5 # PRED: 4 [50.0%] (true,exec) <L119>:; D.20813_497 = regno_328 * 20; ivtmp.200_122 = (struct cse_reg_info *) D.20813_497; # SUCC: 6 [100.0%] (fallthru) OK. We're just getting the offset of the entry within the table that we want. # BLOCK 6 # PRED: 21 [89.0%] (dfs_back,true,exec) 5 [100.0%] (fallthru) # ivtmp.200_499 = PHI <ivtmp.200_498(21), ivtmp.200_122(5)>; # i_33 = PHI <i_370(21), regno_328(5)>; <L6>:; cse_reg_info_table.99_335 = cse_reg_info_table; p_336 = cse_reg_info_table.99_335 + ivtmp.200_499; D.20520_337 = p_336->timestamp; cse_reg_info_timestamp.100_338 = cse_reg_info_timestamp; if (D.20520_337 != cse_reg_info_timestamp.100_338) goto <L7>; else goto <L8>; # SUCC: 7 [51.2%] (true,exec) 8 [48.8%] (false,exec) Check and see if the entry's timestamp indicates the entry is valid/initialized. # BLOCK 7 # PRED: 6 [51.2%] (true,exec) <L7>:; get_cse_reg_info_1 (i_33); # SUCC: 8 [100.0%] (fallthru,exec) Initialize this entry (if necessary). # BLOCK 8 # PRED: 6 [48.8%] (false,exec) 7 [100.0%] (fallthru,exec) <L8>:; D.20361_341 = p_336->reg_in_table; if (D.20361_341 >= 0) goto <L10>; else goto <L18>; # SUCC: 9 [79.0%] (true,exec) 15 [21.0%] (false,exec) Load REG_IN_TABLE for this register. So far so good. # BLOCK 9 # PRED: 8 [79.0%] (true,exec) <L10>:; cse_reg_info_table.99_383 = cse_reg_info_table; p_384 = cse_reg_info_table.99_383 + ivtmp.200_499; D.20529_385 = p_384->timestamp; cse_reg_info_timestamp.100_386 = cse_reg_info_timestamp; if (D.20529_385 != cse_reg_info_timestamp.100_386) goto <L11>; else goto <L12>; # SUCC: 10 [51.2%] (true,exec) 11 [48.8%] (false,exec) Totally useless -- we're verifying that this entry is valid again. There's no way this entry can be invalid at this point. So, let's see. 3 loads, 1 comparison and 1 arithmetic operation we don't need. # BLOCK 10 # PRED: 9 [51.2%] (true,exec) <L11>:; get_cse_reg_info_1 (i_33); # SUCC: 11 [100.0%] (fallthru,exec) This can't ever be reached at runtime because there's no way the entry would need initialization again. # BLOCK 11 # PRED: 9 [48.8%] (false,exec) 10 [100.0%] (fallthru,exec) <L12>:; D.20363_389 = p_384->reg_in_table; cse_reg_info_table.99_393 = cse_reg_info_table; p_394 = cse_reg_info_table.99_393 + ivtmp.200_499; D.20538_395 = p_394->timestamp; cse_reg_info_timestamp.100_396 = cse_reg_info_timestamp; if (D.20538_395 != cse_reg_info_timestamp.100_396) goto <L14>; else goto <L15>; # SUCC: 12 [51.2%] (true,exec) 13 [48.8%] (false,exec) Totally useless. We don't need to load the REG_IN_TABLE field again -- we already loaded it in block #8 above. But the calls (which can never be reached at runtime) must be considered as potentially modifying REG_IN_TABLE. Ugh. Worse yet we then proceed to verify this entry is valid AGAIN. 4 loads, 1 comparison and 1 arithmetic operation that we don't need. # BLOCK 12 # PRED: 11 [51.2%] (true,exec) <L14>:; get_cse_reg_info_1 (i_33); # SUCC: 13 [100.0%] (fallthru,exec) ANother call that can't be reached at runtime. We certainly don't need to initialize the entry 3 times :( # BLOCK 13 # PRED: 11 [48.8%] (false,exec) 12 [100.0%] (fallthru,exec) <L15>:; D.20365_399 = p_394->reg_tick; if (D.20363_389 != D.20365_399) goto <L17>; else goto <L18>; # SUCC: 14 [51.2%] (true,exec) 15 [48.8%] (false,exec) Load REG_TICK and compare. This is necessary. # BLOCK 14 # PRED: 13 [51.2%] (true,exec) <L17>:; remove_invalid_refs (i_33); Remove the entry if all the tests passed. Basically we've got 7 loads, 2 arithmetic instructions, and 2 compare/branch instructions which we execute, but which are totally useless. Plus we've got two calls which can never be reached at runtime. Basically each invocation of an accessor macro results in a test for validity and conditional initialization -- which can't be removed. OUCH OUCH OUCH. I realize that we are trying to write clean, easy to read code by using a level of abstraction, but the abstraction is really getting in the way of achieving reasonable compile-time performance. Here's what I get by explicitly calling get_cse_reg_info once and just doing dereferences out of the structure: # BLOCK 5 # PRED: 4 [50.0%] (true,exec) <L80>:; D.20623_259 = regno_191 * 20; ivtmp.200_80 = (struct cse_reg_info *) D.20623_259; # SUCC: 6 [100.0%] (fallthru) Same as the previous code -- get the offset of the entry within the table that we want. # BLOCK 6 # PRED: 11 [89.0%] (dfs_back,true,exec) 5 [100.0%] (fallthru) # ivtmp.200_261 = PHI <ivtmp.200_260(11), ivtmp.200_80(5)>; # i_19 = PHI <i_207(11), regno_191(5)>; <L6>:; cse_reg_info_table.99_198 = cse_reg_info_table; reg_info_199 = cse_reg_info_table.99_198 + ivtmp.200_261; D.20486_200 = reg_info_199->timestamp; cse_reg_info_timestamp.100_201 = cse_reg_info_timestamp; if (D.20486_200 != cse_reg_info_timestamp.100_201) goto <L7>; else goto <L8>; # SUCC: 7 [51.2%] (true,exec) 8 [48.8%] (false,exec) Test the entry for validity. Same as previous code. # BLOCK 7 # PRED: 6 [51.2%] (true,exec) <L7>:; get_cse_reg_info_1 (i_19); # SUCC: 8 [100.0%] (fallthru,exec) Conditionally initialize the entry. Same as previous code. # BLOCK 8 # PRED: 6 [48.8%] (false,exec) 7 [100.0%] (fallthru,exec) <L8>:; D.20363_205 = reg_info_199->reg_in_table; if (D.20363_205 >= 0) goto <L10>; else goto <L12>; # SUCC: 9 [79.0%] (true,exec) 11 [21.0%] (false,exec) Whee, load the REG_IN_TABLE bit and test it. # BLOCK 9 # PRED: 8 [79.0%] (true,exec) <L10>:; D.20364_209 = reg_info_199->reg_tick; if (D.20363_205 != D.20364_209) goto <L11>; else goto <L12>; # SUCC: 10 [51.2%] (true,exec) 11 [48.8%] (false,exec) Load the REG_TICK and compare it to the REG_IN_TABLE. # BLOCK 10 # PRED: 9 [51.2%] (true,exec) <L11>:; remove_invalid_refs (i_19); # SUCC: 11 [100.0%] (fallthru,exec) Invalidate entries if necessary. As you can see the second is *FAR* more efficient and compact. In fact, it's nearly optimal. Fixing cse.c to not use the accessor macros for REG_IN_TABLE, REG_TICK and SUBREG_TICKED saves about 1% compilation time for the components of cc1. Yes, that's a net 1% improvement by dropping the abstraction layer. Comments? jeff