https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62220

            Bug ID: 62220
           Summary: missed optimization wrt module for loop variable
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: drepper.fsp+rhbz at gmail dot com

Created attachment 33376
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33376&action=edit
program to measure difference of the original and proposed optimized code

I've optimized some prominent code bases by hand to achieve significant speedup
but this optimization could have been performed by gcc.  The problem is that
some program authors underestimate the price of integer module operations.  I
attaching a test program (x86-64) below and the time difference I see about
1500%.

The general pattern is:

   loop index variable I with upper limit L

      for (I = <?>; I < L; ++I)

   inside loop use I % M where M is loop in-variant

      e.g.: var[I % M]

This could be optimized to

   compute LL = L - L % M

   loop index variable I with upper limit LL; nested second loop

     J = <?> % M
     for (I = <?>; I < LL; ) {
       for (; J < M; ++I, ++J) {

          ... loop body ...

          ... e.g., var[J]
          ... instead of var[I % M]
       }
       J = 0;
     }

Reply via email to