On Fri, Jan 15, 2016 at 6:16 AM, H.J. Lu <hjl.to...@gmail.com> wrote: > On Fri, Jan 15, 2016 at 6:11 AM, Jakub Jelinek <ja...@redhat.com> wrote: >> On Fri, Jan 15, 2016 at 01:36:40PM +0100, Richard Biener wrote: >>> >> My patches only change SSE patterns without ssememalign >>> >> attribute, which defaults to >>> >> >>> >> (define_attr "ssememalign" "" (const_int 0)) >>> > >>> > The patch is OK for mainline. >>> > >>> > (subst.md changes can IMO be considered obvious.) >>> >>> This change (r232087 or r232088) is responsible for a drop >>> of 482.sphinx3 on AMD Fam15 (bulldozer) from score 33 to 18. >>> >>> See http://gcc.opensuse.org/SPEC/CFP/sb-megrez-head-64-2006/recent.html >> >> Yeah, it seems to make a significant difference on code generated with >> -mavx, e.g. in cmn.c with >> -Ofast -quiet -march=bdver2 -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 >> -msse4a -mcx16 -msahf -mno-movbe -maes -mno-sha -mpclmul -mpopcnt -mabm >> -mlwp -mfma -mfma4 -mxop -mbmi -mno-bmi2 -mtbm -mavx -mno-avx2 -msse4.2 >> -msse4.1 -mlzcnt -mno-rtm -mno-hle -mno-rdrnd -mf16c -mno-fsgsbase >> -mno-rdseed -mprfchw -mno-adx -mfxsr -mxsave -mno-xsaveopt -mno-avx512f >> -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt >> -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl >> -mno-avx512ifma -mno-avx512vbmi -mno-clwb -mno-pcommit -mno-mwaitx >> -mno-clzero -mno-pku --param l1-cache-size=16 --param l1-cache-line-size=64 >> --param l2-cache-size=2048 -mtune=bdver2 >> Reduced testcase: >> >> -Ofast -mavx -mno-avx2 -mtune=bdver2 >> >> float *a, *b; >> int c, d, e, f; >> void >> foo (void) >> { >> for (; c; c++) >> a[c] = 0; >> if (!d) >> for (; c < f; c++) >> b[c] = (double) e / b[c]; >> } >> >> r232086 vs. r232088 gives. I don't see significant differences before IRA, >> IRA seems to have some cost differences (strange), but the same dispositions, >> and LRA ends up with all the differences. >> > > That may be due to the difference between define_memory_constraint and > define_constraint. LRA doesn't consider register for define_constraint if > memory is true. >
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68991#c14 -- H.J.