https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760

--- Comment #14 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to rguent...@suse.de from comment #13)

> Usually the peeling is done to improve branch prediction on the
> prologue/epilogue.

Modern branch predictors do much better on a loop than with this kind of code:

        ands    x12, x11, 7
        beq     .L70
        cmp     x12, 1
        beq     .L55
        cmp     x12, 2
        beq     .L57
        cmp     x12, 3
        beq     .L59
        cmp     x12, 4
        beq     .L61
        cmp     x12, 5
        beq     .L63
        cmp     x12, 6
        bne     .L72

That's way too many branches close together so most predictors will hit the
maximum branches per fetch block limit and not predict them.

If you wanted to peel it would have to be like:

if (n & 4)
  do 4 iterations
if (n & 2)
  do 2 iterations
if (n & 1)
  do 1 iteration

However that's still way too much code explosion for little or no gain. The
only case where this makes sense is in a handwritten memcpy.

Reply via email to