http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58529
--- Comment #8 from Tobias Burnus <burnus at gcc dot gnu.org> --- (In reply to H.J. Lu from comment #7) > Can you add "-funroll-loops --param max-unroll-times=7"? On Intel Core i5-3570 (glibc-2.18, openSUSE 13.1b1), I get with the attached Intel .s file and today's GCC: real 0m0.854s user 0m0.853s sys 0m0.001s ICC real 0m1.096s user 0m1.095s sys 0m0.001s GCC real 0m0.653s user 0m0.652s sys 0m0.002s GCC -funroll-loops real 0m0.661s user 0m0.660s sys 0m0.000s ditto, max-unroll-times=7 I have to re-check why unrolling made it slower on that Xeon E5-2630 (comment 0) but faster on the i5.