http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50557
--- Comment #7 from William J. Schmidt <wschmidt at gcc dot gnu.org> 2011-10-10 12:40:01 UTC --- I don't have anything too helpful to add. This code as it stands is balanced on a knife's edge for register usage for the particular target, so it's always going to be sensitive to compiler changes (not just this one). One thing I notice is that the loop is hand-unrolled four times. Why not let the compiler intelligently choose the unroll factor? I don't know what the result would be, but presumably the unroller has some heuristics to take target characteristics into account. Seems to me the factor of 4 is a bit aggressive for this target.