http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49851
--- Comment #3 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-07-26 14:32:35 UTC --- (In reply to comment #2) > AIR spends 86% of its time in DERIV[XY] (for ICC), 78% of its time there for > GCC. > The performance difference also reproduces when not inlining DERIV[XY] at all > (though it's slightly less of a difference - GCC doesn't care). Actually it does not. Without inlining: GCC: air 2.42 4728556 3.99 10 0.1798 SUBROUTINE DERIVX(D,U,Ux,Al,Np,Nd,M) /* derivx_ total: 8194999 42.3750 */ SUBROUTINE DERIVY(D,U,Uy,Al,Np,Nd,M) /* derivy_ total: 8176250 42.2781 */ ICC: air 2.90 4072563 3.45 10 0.1809 SUBROUTINE DERIVX(D,U,Ux,Al,Np,Nd,M) /* derivx_ total: 8060834 47.2620 */ SUBROUTINE DERIVY(D,U,Uy,Al,Np,Nd,M) /* derivy_ total: 7070627 41.4563 */ so not much difference in the total hits. Which means ICC performs some context-dependent optimization.