On Fri, Jan 15, 2016 at 6:11 AM, Jakub Jelinek <ja...@redhat.com> wrote: > On Fri, Jan 15, 2016 at 01:36:40PM +0100, Richard Biener wrote: >> >> My patches only change SSE patterns without ssememalign >> >> attribute, which defaults to >> >> >> >> (define_attr "ssememalign" "" (const_int 0)) >> > >> > The patch is OK for mainline. >> > >> > (subst.md changes can IMO be considered obvious.) >> >> This change (r232087 or r232088) is responsible for a drop >> of 482.sphinx3 on AMD Fam15 (bulldozer) from score 33 to 18. >> >> See http://gcc.opensuse.org/SPEC/CFP/sb-megrez-head-64-2006/recent.html > > Yeah, it seems to make a significant difference on code generated with > -mavx, e.g. in cmn.c with > -Ofast -quiet -march=bdver2 -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 > -msse4a -mcx16 -msahf -mno-movbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mlwp > -mfma -mfma4 -mxop -mbmi -mno-bmi2 -mtbm -mavx -mno-avx2 -msse4.2 -msse4.1 > -mlzcnt -mno-rtm -mno-hle -mno-rdrnd -mf16c -mno-fsgsbase -mno-rdseed > -mprfchw -mno-adx -mfxsr -mxsave -mno-xsaveopt -mno-avx512f -mno-avx512er > -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec > -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma > -mno-avx512vbmi -mno-clwb -mno-pcommit -mno-mwaitx -mno-clzero -mno-pku > --param l1-cache-size=16 --param l1-cache-line-size=64 --param > l2-cache-size=2048 -mtune=bdver2 > Reduced testcase: > > -Ofast -mavx -mno-avx2 -mtune=bdver2 > > float *a, *b; > int c, d, e, f; > void > foo (void) > { > for (; c; c++) > a[c] = 0; > if (!d) > for (; c < f; c++) > b[c] = (double) e / b[c]; > } > > r232086 vs. r232088 gives. I don't see significant differences before IRA, > IRA seems to have some cost differences (strange), but the same dispositions, > and LRA ends up with all the differences. >
That may be due to the difference between define_memory_constraint and define_constraint. LRA doesn't consider register for define_constraint if memory is true. -- H.J.