On an AMD amdfam10 system, gcc 4.5 (713s) is 7% faster than gcc 4.6 (763s) With the following settings:
4.6: gcc version 4.6.0 20100812 (experimental) (GCC) FOPTIMIZE = -Ofast -funroll-all-loops -fno-tree-pre -mveclibabi=acml -m64 -march=amdfam10 EXTRA_LDFLAGS = -L$(ACML_DIR) -lacml_mv 4.5: gcc version 4.5.2 20100818 (prerelease) (GCC) COPTIMIZE = -O3 -ffast-math -funroll-all-loops -fno-tree-pre FOPTIMIZE = -O3 -ffast-math -funroll-all-loops -fno-tree-pre -mveclibabi=acml -m64 -march=amdfam10 EXTRA_LDFLAGS = -L$(ACML_DIR) -lacml_mv NOTE that for gcc 4.6, "-Ofast" = "-O3 -ffast-math" and "-fprefetch-loop-arrays" is turned on @ -O3. Also acml4.4.0 is used for both tests. -- Summary: CPU2006 434.zeusmp: gcc 4.6 7% regression from gcc 4.6 Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45390