When profile based feedback indicates that a loop has a low iteration count, it will often refuse to unroll even though unrolling is still useful. Moreover, while it knows about the average loop iteration count, it lacks the concept of a prevalent iteration count.
In particular, the header checksumming of the EEMBC packetflow benchmark usually has ten iterations. With gcc 3.x, unrolling was by a factor of four, which was mediocre. With the introduction of the new loop unroller in 4.0, unrolling when doing profile feedback was no longer done at all. The proper thing to do would be to unroll this loop five times. When the case of a loop that is not a multiple of the chosen unroll factor is deemed sufficiently unlikely, that case can be taken care of by generating a non-unrolled loop after the unrolled loop. The unrolled loop can use a suitably transformed unequality check for the loop start and end to verify that a sufficient number of iterations is outstanding, so that no casesi / tablejump code is needed. See also: http://gcc.gnu.org/ml/gcc-patches/2004-09/msg02373.html -- Summary: inept unrolling for small iteration counts Product: gcc Version: 4.3.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: amylaar at gcc dot gnu dot org OtherBugsDependingO 29842 nThis: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29946