https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78611
Bug ID: 78611 Summary: -march=native makes code 3x slower Product: gcc Version: 6.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: pepalogik at seznam dot cz Target Milestone: --- Created attachment 40199 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40199&action=edit Source code, include files, and inputs Hi, I encountered the problem in version 5.4.0, then installed 6.2.0, and it's still the same. Details below and test case attached. jenda@VivoBook ~/Bug reports/gfortran/6/PhSh1 $ gfortran-6 -v Using built-in specs. COLLECT_GCC=gfortran-6 COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/6/lto-wrapper Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 6.2.0-3ubuntu11~16.04' --with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-6 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-6-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 6.2.0 20160901 (Ubuntu 6.2.0-3ubuntu11~16.04) jenda@VivoBook ~/Bug reports/gfortran/6/PhSh1 $ gfortran-6 phsh1.f -std=legacy -I. -o default/phsh1 jenda@VivoBook ~/Bug reports/gfortran/6/PhSh1 $ cd default/ jenda@VivoBook ~/Bug reports/gfortran/6/PhSh1/default $ time ./phsh1 < ../bmtz Slab or Bulk calculation? input 1 for Slab or 0 for Bulk Input the MTZ value from the substrate calculation real 72m51.345s user 72m48.584s sys 0m0.968s jenda@VivoBook ~/Bug reports/gfortran/6/PhSh1/default $ cd .. jenda@VivoBook ~/Bug reports/gfortran/6/PhSh1 $ gfortran-6 phsh1.f -std=legacy -I. -march=native -o march/phsh1 jenda@VivoBook ~/Bug reports/gfortran/6/PhSh1 $ cd march/ jenda@VivoBook ~/Bug reports/gfortran/6/PhSh1/march $ time ./phsh1 < ../bmtz Slab or Bulk calculation? input 1 for Slab or 0 for Bulk Input the MTZ value from the substrate calculation real 217m56.080s user 217m52.092s sys 0m1.096s As shown, code compiled with -march=native is 3x slower. All outputs are identical, so it is solely a performance issue. Adding -O3 isn't very helpful. My CPU is Intel(R) Core(TM) i3-3217U CPU @ 1.80GHz with these flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx f16c lahf_lm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts The code is an old, single-threaded F77 program calculating crystal potentials. Profiler shows that almost all the time is spent in subroutine MTZ.