https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760
--- Comment #14 from Wilco <wilco at gcc dot gnu.org> --- (In reply to rguent...@suse.de from comment #13) > Usually the peeling is done to improve branch prediction on the > prologue/epilogue. Modern branch predictors do much better on a loop than with this kind of code: ands x12, x11, 7 beq .L70 cmp x12, 1 beq .L55 cmp x12, 2 beq .L57 cmp x12, 3 beq .L59 cmp x12, 4 beq .L61 cmp x12, 5 beq .L63 cmp x12, 6 bne .L72 That's way too many branches close together so most predictors will hit the maximum branches per fetch block limit and not predict them. If you wanted to peel it would have to be like: if (n & 4) do 4 iterations if (n & 2) do 2 iterations if (n & 1) do 1 iteration However that's still way too much code explosion for little or no gain. The only case where this makes sense is in a handwritten memcpy.