------- Additional Comments From danalis at cis dot udel dot edu 2005-06-30 22:16 ------- I'm looking at the reduced testcase from comment #6, and I noticed that f() is declared double, but does not return anything. Thus the code doesn't compile with -O3 -Wall -Werror. If I fix the bug adding a "return(return *ap1)", or by declaring f() to be void, the performance regression dissappears.
Here's the test harness I used to call the minimized testcase: int main(int argc, char *argv[]){ double ay[100][100]; const double *py, *pz; double *dxb, *ap1; double sum=0; int i,j,k; for(i=0; i<100; i++){ for(j=0; j<100; j++){ ay[i][j] = 1000*(i+1)+2*(j+1); } } py = ay[0]; pz = ay[1]; dxb = ay[2]; ap1 = ay[3]; for(k=0; k<100; k++){ for(i=0; i<10000; i++){ for(j=0; j<12; j++){ sum += f(py,pz,dxb,ap1,j,5); sum /= 2; } } } cout << sum << endl; return 0; } Is that ok? I compiled this with -O3 -mtune=pentium. Runtimes *without* the fix to f() were 0.31s, 8.72s, 8.83s and 8.80s when compiled with g++ 2.95.3, 3.4.3, 4.0.0 and 4.1.0-20050625, respectively (making this a large performance regression relative to gcc-2.95.3). Runtimes *with* the fix were 0.34s, 0.28s, 0.36s, 0.32s when compiled with g++ 2.95.3, 3.4.3, 4.0.0 and 4.1.0-20050625, respectively. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863