------- Additional Comments From roger at eyesopen dot com 2005-08-11 14:56 ------- I'll take a look, but on first inspection this looks more like a register allocation issue than a reg-stack problem. In the first (4.0) case, the accumulator "result" is assigned a hard register in the loop, whilst in the second (4.1) it is being placed in memory, at -16(%ebp). This may also explain why extracting that loop into a stand-alone function produces optimal/original code, as the register allocator gets less confused by other influences in the function. The extracted code is also even better than 4.0's, as it avoids writing "result" to memory on each iteration (store sinking).
The second failure does show an interesting reg-stack/reg-alloc interaction though. The "hot" accumulator value is live on the backedge and the exit edge of the loop but not on the incoming edge. Clearly, the best fix is to make this value live on the incoming edge, but failing that it is actually better to prevent it being live on the back and exit edges, and add compensation code after the loop. i.e. if the store to result in the loop used fstpl, you wouldn't need to fstp %st(0) on each loop iteration, but would instead need a compensating fldl after the loop. I'm not sure how easy it would be to teach GCC's register allocation to take these considerations into account, or failing that, whether reg-stack could be tweaked/hacked to locally fix this up. But the fundamental problem is that reg-alloc should assign result to a hard resigster as it clearly knows there are enough available in that block. reg-stack.c is just doing what its told, and in this case its being told to do something stupid. -- What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Ever Confirmed| |1 Last reconfirmed|0000-00-00 00:00:00 |2005-08-11 14:56:31 date| | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23322