http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57024
--- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> 2013-04-21 17:33:46 UTC --- Maybe the processor is just badly detected or modeled? I noticed on the same machine another surprising tuning issue. Adding -march=native changes for some other code: .cfi_startproc - movsd %xmm0, -8(%rsp) - movq -8(%rsp), %rax + movq %xmm0, %rax shrq $32, %rax andl $2146435072, %eax subl $2146435072, %eax shrl $31, %eax ret .cfi_endproc and slows the function down by 10% (I assume that the difference is due to the instructions, not some random code alignment difference, I could be wrong). This is a well known tuning parameter, so I am surprised -march=native makes a worse decision than generic. For reference, the output with -v: /usr/lib/gcc-snapshot/libexec/gcc/x86_64-linux-gnu/4.9.0/cc1 -quiet -v -imultilib . -imultiarch x86_64-linux-gnu -D finite=finite1 test.c -march=core2 -mcx16 -msahf -mno-movbe -mno-aes -mno-pclmul -mno-popcnt -mno-abm -mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-bmi2 -mno-tbm -mno-avx -mno-avx2 -mno-sse4.2 -msse4.1 -mno-lzcnt -mno-rtm -mno-hle -mno-rdrnd -mno-f16c -mno-fsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mno-xsave -mno-xsaveopt --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=6144 -mtune=core2 -quiet -dumpbase test.c -auxbase-strip y.s -O3 -Wall -Wextra -version -fno-builtin-finite -o y.s And full /proc/cpuinfo (for the first core): processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU T9600 @ 2.80GHz stepping : 10 microcode : 0xa07 cpu MHz : 2800.000 cache size : 6144 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm ida dtherm tpr_shadow vnmi flexpriority bogomips : 5584.90 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: