On a AMD amdfam10 system, gcc 4.5 (892s) is 15% faster than gcc 4.6 (1026s) With the following settings:
4.6: gcc version 4.6.0 20100812 (experimental) (GCC) COPTIMIZE = -Ofast -funroll-all-loops -fno-tree-pre --param prefetch-latency=700 -mveclibabi=acml -m64 -march=amdfam10 FOPTIMIZE = -Ofast -funroll-all-loops -fno-tree-pre -mveclibabi=acml -m64 -march=amdfam10 EXTRA_LDFLAGS = -L$(ACML_DIR) -lacml_mv 4.5: gcc version 4.5.2 20100818 (prerelease) (GCC) COPTIMIZE = -O3 -ffast-math -funroll-all-loops -fno-tree-pre -fprefetch-loop-arrays --param prefetch-latency=700 -mveclibabi=acml -m64 -march=amdfam10 FOPTIMIZE = -O3 -ffast-math -funroll-all-loops -fno-tree-pre -mveclibabi=acml -m64 -march=amdfam10 EXTRA_LDFLAGS = -L$(ACML_DIR) -lacml_mv NOTE that for gcc 4.6, "-Ofast" = "-O3 -ffast-math" and "-fprefetch-loop-arrays" is turned on @ -O3. Also acml4.4.0 is used for both tests. -- Summary: CPU2006 cactusADM: gcc 4.6 15% regression from 4.5 Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45389