Jakub Jelinek wrote:
Including loop unrolling to -O2 is IMNSHO a bad idea, as loop unrolling increases code size, sometimes a lot. And the distinction between -O2 and -O3 is exactly in the space-for-speed tradeoffs.
That's certainly a valid way of defining the difference (and certainly used to be the case in the old days when the principle extra optimization was inlining)
On many CPUs for many programs, -O3 generates slower code than -O2, because the cache footprint disadvantages override positive effects of the loop unrolling, extra inlining etc.
That's what we have found, though I would have thought it unusual that loop unrolling would run into this cache effect in most cases.
Jakub