http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57024



--- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> 2013-04-21 17:33:46 
UTC ---

Maybe the processor is just badly detected or modeled? I noticed on the same

machine another surprising tuning issue. Adding -march=native changes for some

other code:



     .cfi_startproc

-    movsd    %xmm0, -8(%rsp)

-    movq    -8(%rsp), %rax

+    movq    %xmm0, %rax

     shrq    $32, %rax

     andl    $2146435072, %eax

     subl    $2146435072, %eax

     shrl    $31, %eax

     ret

     .cfi_endproc



and slows the function down by 10% (I assume that the difference is due to the

instructions, not some random code alignment difference, I could be wrong).

This is a well known tuning parameter, so I am surprised -march=native makes a

worse decision than generic. For reference, the output with -v:



 /usr/lib/gcc-snapshot/libexec/gcc/x86_64-linux-gnu/4.9.0/cc1 -quiet -v

-imultilib . -imultiarch x86_64-linux-gnu -D finite=finite1 test.c -march=core2

-mcx16 -msahf -mno-movbe -mno-aes -mno-pclmul -mno-popcnt -mno-abm -mno-lwp

-mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-bmi2 -mno-tbm -mno-avx -mno-avx2

-mno-sse4.2 -msse4.1 -mno-lzcnt -mno-rtm -mno-hle -mno-rdrnd -mno-f16c

-mno-fsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mno-xsave -mno-xsaveopt

--param l1-cache-size=32 --param l1-cache-line-size=64 --param

l2-cache-size=6144 -mtune=core2 -quiet -dumpbase test.c -auxbase-strip y.s -O3

-Wall -Wextra -version -fno-builtin-finite -o y.s



And full /proc/cpuinfo (for the first core):



processor    : 0

vendor_id    : GenuineIntel

cpu family    : 6

model        : 23

model name    : Intel(R) Core(TM)2 Duo CPU     T9600  @ 2.80GHz

stepping    : 10

microcode    : 0xa07

cpu MHz        : 2800.000

cache size    : 6144 KB

physical id    : 0

siblings    : 2

core id        : 0

cpu cores    : 2

apicid        : 0

initial apicid    : 0

fpu        : yes

fpu_exception    : yes

cpuid level    : 13

wp        : yes

flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov

pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm

constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor

ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm ida dtherm

tpr_shadow vnmi flexpriority

bogomips    : 5584.90

clflush size    : 64

cache_alignment    : 64

address sizes    : 36 bits physical, 48 bits virtual

power management:

Reply via email to