------- Additional Comments From law at redhat dot com 2005-04-06 19:21 ------- More info.
It appears that threading one specific jump is responsible for triggering the big speedup. And it could cause the kind of effects we're seeing. Basically we're threading a conditional branch to a loop exit test back to the top of the loop. This has the effect of creating nested loops. This in turn causes the register allocators to make different choices in regards to what values should be kept in registers and which end up on the stack (and at what offsets each object appears on the stack). That could cause the kind of decrease in L2 activity I'm seeing, particularly with the recursive nature of the function in question. I've got a few more tests to run before I claim this to be the cause of the huge improvement. But this is the best theory which fits the data I've seen so far. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19794