------- Comment #5 from jacob at math dot jussieu dot fr 2006-12-13 20:22 ------- Nope... with -O3 -ffast-math I get 1.9 seconds in average (this is a laptop with CPU frequency scaling, so it's difficult to get precise numbers). Adding -funroll-loops in addition to -ffast-math doesn't seem to make a difference. We're very far from the 0.3 seconds I get with -DUNROLL.
Also, trying again -O3 -funroll-loops, I get again 1.9 seconds, so I think -funroll-loops didn't make any difference and I had been fooled by CPU frequency scaling. The problem with the multiplication is not important to me, it's just something I used in this example. I could as well have written for( int i = 0; i < 3; i++ ) for( int j = 0; j < 3; j++ ) (*this)(i, j) = (i == j) ? factor : 0; But this turns out to be even slower. I presume that's because, as the loops don't get both unrolled, the test i==j ?: makes branches at run-time. Anyway thanks for being supportive and having looked into my problem. May I ask again, can I hope for a fully-unrolling-nested-loops g++ in the near future? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30201