Hi Jeff, > Fixing cse.c to not use the accessor macros for REG_IN_TABLE, REG_TICK > and SUBREG_TICKED saves about 1% compilation time for the components > of cc1. Yes, that's a net 1% improvement by dropping the abstraction > layer.
Yes, I've noticed the problem. In my defense, the code in question was even worse before I touched it. :-) With the old code, every time we access a cse_reg_info entry that is different from the last access, we were generating a function call. Nowadays, we avoid calls to get_cse_reg_info_1 95% of the time. Of course, it's tough to beat the performance of your explicit initialization approach, but here are couple of things that I have thought about while keeping some abstraction layer. The first thought is to expose the timestamp update to the user of those macros that you mentioned. /* Find a cse_reg_info entry for REGNO. */ static inline struct cse_reg_info * get_cse_reg_info (unsigned int regno) { struct cse_reg_info *p = &cse_reg_info_table[regno]; /* If this entry has not been initialized, go ahead and initialize it. */ if (p->timestamp != cse_reg_info_timestamp) { get_cse_reg_info_1 (regno); p->timestamp = cse_reg_info_timestamp; /* <- Look! */ } return p; } This way, DOM may be able to do jump threading to some extent and remove a lot of the timestamp checks. Of couse, jump threading opportunities are blocked when we have a non-pure/const function call like so: for (i = regno; i < endregno; i++) { if (REG_IN_TABLE (i) >= 0 && REG_IN_TABLE (i) != REG_TICK (i)) remove_invalid_refs (i); /* <- Look! */ REG_IN_TABLE (i) = REG_TICK (i); SUBREG_TICKED (i) = -1; } The second thought is to initialize all of cse_reg_info entries at the beginning of cse_main. Set aside a bitmap with as many bits as max_regs. Whenever we use one of these accessor macros for register k, set a bit k saying "cse_reg_info_table[k] is in use." This way, when we are done with a basic block, we can walk the bitmap and reinitialize those that are used. Again, a good optimizer should be able to eliminate most of these bit sets, but a non-pure/const function call will block the cleanup opportunities. Of course, this bitmap walk is far more expensive than cse_reg_info_timestamp++. Kazu Hirata