------- Additional Comments From paolo dot bonzini at lu dot unisi dot ch 2005-08-17 20:07 ------- Subject: Re: [meta-bug] optimizations that CSE still catches
>> unsigned outcnt; >> extern void flush_outbuf(void); >> >> void >> bi_windup(unsigned char *outbuf, unsigned char bi_buf) >> { >> outbuf[outcnt] = bi_buf; >> if (outcnt == 16384) >> flush_outbuf(); >> outbuf[outcnt] = bi_buf; >> } >> >> >Presumably the store into outbuf prevents the SSA optimizers from >commonizing the first two loads of outcnt and the call to flush_outbuf >prevents the SSA optimizers from commonizing the last load of outcnt on >the path which bypasses the call to flush_outbuf. Right? > > Not really. First of all, as stevenb pointed out on IRC, this is quite specific to powerpc-apple-darwin and other targets where programs are compiled as PIC by default. Steven's SPEC testing under Linux has not shown this behavior, but shared libraries there *will* suffer from the same problem! We'd want the code to become void bi_windup(unsigned char *outbuf, unsigned char bi_buf) { int t1 = outcnt; outbuf[t1] = bi_buf; int t2 = outcnt, t3; if (t2 == 16384) { flush_outbuf(); t3 = outcnt; } else t3 = t2; outbuf[t3] = bi_buf; } If we disable CSE path following, and keep only one GCSE pass, we "waste" the opportunity to do this optimization, because we generate temporaries for the partially redundant address of outcnt. With two GCSE passes, the second is able to eliminate the partially redundant load. Of course what we really miss is load PRE on the tree level, but it is good that --param max-gcse-passes=2 can be a replacement of -fcse-skip-blocks -fcse-follow-jumps. Testing mainline GCC against a patch including no path following + 2 GCSE passes + my forward propagation pass, I'm seeing SPEC improvements of +2 to +8% on powerpc-apple-darwin. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19721