------- Comment #5 from rakdver at gcc dot gnu dot org 2006-09-28 11:34 ------- (In reply to comment #4) > On x86_64 4.2 decides to unroll 9 times while on 4.1 it unrolls 8 times. This > is > a code-size regression, but other than that? The 4.2 version runs slightly > faster than the 4.1 version, though the difference may be in the noise.
Choosing 9 instead of 8 looks weird, though :-). The reason is following: jump threading in vrp2 pass peels one iteration of the loop. With this change, unrolling by factor of 9 creates smaller code (only one extra iteration needs to be peeled to make the number of iterations divisible by 9, while one would need to peel 7 more iterations to make it divisible by 8). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256