http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53397
Bug #: 53397 Summary: Scimark performance drops by 10x times when compiled -O3 -march=amdfam10 due to generation more prefecthes Classification: Unclassified Product: gcc Version: tree-ssa Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: venkataramanan.ku...@amd.com With GCC4.7 the benchmark score drops from ~400 Mflops to ~40 mflops. Almost 10 folds. Prefecth instructions introduced in the innermost loops of "FFT_transform_internal" ( FFT.c ) in GCC4.7 but not in GCC4.6 which is causing the slow down. Compiling this function alone as a separate test case with -fno-prefetch-loop-arrays brings back the original score. The problem is exposed http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=175474 With GCC r175473 -------------------------- gcc -O3 -march=amdfam10 *.c -o Scimark175473 -lm vekumar@pcedinar5:/local/home/vekumar/SciMark2_bench/SciMark2> ./Scimark175473 ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to p...@nist.gov) ** ** ** Using 2.00 seconds min time per kenel. Composite Score: 99.67 FFT Mflops: 498.35 (N=1024) With GCC r175474 ------------------------- gcc -O3 -march=amdfam10 *.c -o Scimark175474 -lm vekumar@pcedinar5:/local/home/vekumar/SciMark2_bench/SciMark2> ./Scimark175474 ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to p...@nist.gov) ** ** ** Using 2.00 seconds min time per kenel. Composite Score: 7.73 FFT Mflops: 38.66 (N=1024)