While developing a Free C++ library, I am facing what I think is a bug in gcc:
it doesn't unroll nested loops. Namely, in the example program that I paste
below, there is a nested loop like

for( int i = 0; i < 3; i++ )
  for( int j = 0; j < 3; j++ )
    do_something( i, j );

and only the inner loop gets completely unrolled (the loop on j), the outer
loop (on i) is only partially unrolled (this is according to I. L. Taylor on
gcc@gcc.gnu.org, I don't have the skill to read the binary code).

This is a huge problem for me, not only a detail, as the performance of my code
is about 15% of what it would be if the loops got unrolled.

I cannot unroll loops by hand because this is a template library and the loops
depend on template parameters.

I have made a minimal standalone example program. I paste it below (toto.cpp).

This program does a nested loop if UNROLL is not defined, and does the same
thing but with the loops unrolled by hand if UNROLL is defined. On my machine,
the speed difference is huge:

g++ -DUNROLL -O3 toto.cpp -o toto   ---> toto runs in 0.3 seconds
g++ -O3 toto.cpp -o toto            ---> toto runs in 1.9 seconds

Again, this is not an academic example but something found in a real library,
Eigen (http://eigen.tuxfamily.org). Granted, it's a math library, but still one
that is used in real apps, so fixing this gcc bug would benefit real apps.

So here is the example program toto.cpp:

-----------------------------------------------------------------------

#include<iostream>

class Matrix
{
public:
    double data[9];
    double & operator()( int i, int j )
    {
        return data[i + 3 * j];
    }
    void loadScaling( double factor );
};

void Matrix::loadScaling( double factor)
{
#ifdef UNROLL
    (*this)( 0, 0 ) = factor;
    (*this)( 1, 0 ) = 0;
    (*this)( 2, 0 ) = 0;
    (*this)( 0, 1 ) = 0;
    (*this)( 1, 1 ) = factor;
    (*this)( 2, 1 ) = 0;
    (*this)( 0, 2 ) = 0;
    (*this)( 1, 2 ) = 0;
    (*this)( 2, 2 ) = factor;
#else
    for( int i = 0; i < 3; i++ )
        for( int j = 0; j < 3; j++ )
            (*this)(i, j) = (i == j) * factor;
#endif
}

int main( int argc, char *argv[] )
{
    Matrix m;
    for( int i = 0; i < 100000000; i++ )
        m.loadScaling( i );
    std::cout << "m(0,0) = " << m(0,0) << std::endl;
    return 0;
}


-- 
           Summary: gcc doesn't unroll nested loops
           Product: gcc
           Version: 4.1.1
            Status: UNCONFIRMED
          Severity: major
          Priority: P3
         Component: c++
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: jacob at math dot jussieu dot fr


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30201

Reply via email to