On Sat, Jan 2, 2016 at 3:58 AM, Richard Biener <richard.guent...@gmail.com> wrote: > On January 2, 2016 11:32:33 AM GMT+01:00, Uros Bizjak <ubiz...@gmail.com> > wrote: >>On Thu, Dec 31, 2015 at 4:29 PM, H.J. Lu <hjl.to...@gmail.com> wrote: >>> On Thu, Dec 31, 2015 at 1:14 AM, Uros Bizjak <ubiz...@gmail.com> >>wrote: >>>> On Wed, Dec 30, 2015 at 9:53 PM, H.J. Lu <hjl.to...@gmail.com> >>wrote: >>>>> SSE vector arithmetic and logic instructions only accept aligned >>memory >>>>> operand. This patch adds vector_memory_operand and "Bm" constraint >>for >>>>> aligned SSE memory operand. They are applied to SSE any_logic >>patterns. >>>>> >>>>> OK for trunk and release branches if there are regressions? >>>> >>>> This patch is just papering over deeper problem, as Jakub said in >>the PR [1]: >>>> >>>> --q-- >>>> GCC uses the ix86_legitimate_combined_insn target hook to disallow >>>> misaligned memory into certain SSE instructions. >>>> (subreg:V4SI (reg:TI 245 [ MEM[(const struct bitset >>&)FeatureEntry_21 + 8] ]) 0) >>>> is not misaligned memory, it is a subreg of a pseudo register, so it >>is fine. >>>> If the replacement of the pseudo register with memory happens in >>some >>>> other pass, then it probably either should use the >>>> legitimate_combined_insn target hook or some other one. I think we >>>> have already a PR where that happens during live range shrinking. >>>> --/q-- >>>> >>>> Please figure out where memory replacement happens. There are >>several >>>> other SSE insns (please grep the .md for "ssememalign" attribute) >>that >>>> are affected by this problem, so fixing a couple of patterns won't >>>> solve the problem completely. >>> >>> LRA turns >>> >>> insn 64 63 108 6 (set (reg:V4SI 148 [ vect__28.85 ]) >>> (xor:V4SI (reg:V4SI 149) >>> (subreg:V4SI (reg:TI 147 [ MEM[(const struct bitset >>> &)FeatureEntry_2(D)] ]) 0))) foo.ii:26 3454 {*xorv4si3} >>> (expr_list:REG_DEAD (reg:V4SI 149) >>> (expr_list:REG_DEAD (reg:TI 147 [ MEM[(const struct bitset >>> &)FeatureEntry_2(D)] ]) >>> (expr_list:REG_EQUIV (mem/c:V4SI (plus:DI (reg/f:DI 20 >>frame) >>> (const_int -16 [0xfffffffffffffff0])) [3 >>> MEM[(unsigned int *)&D.2851]+0 S16 A128]) >>> (nil))))) >>> >>> into >>> >>> (insn 64 63 108 6 (set (reg:V4SI 21 xmm0 [orig:148 vect__28.85 ] >>[148]) >>> (xor:V4SI (reg:V4SI 21 xmm0 [149]) >>> (mem:V4SI (reg/v/f:DI 4 si [orig:117 FeatureEntry ] >>[117]) >>> [6 MEM[(const struct bitset &)FeatureEntry_2(D)]+0 S16 A32]))) >>> foo.ii:26 3454 {*xorv4si3} >>> (expr_list:REG_EQUIV (mem/c:V4SI (plus:DI (reg/f:DI 20 frame) >>> (const_int -16 [0xfffffffffffffff0])) [3 >>MEM[(unsigned >>> int *)&D.2851]+0 S16 A128]) >>> (nil))) >>> >>> since >>> >>> (mem:V4SI (reg/v/f:DI 4 si [orig:117 FeatureEntry ] [117]) [6 >>> MEM[(const struct bitset &)FeatureEntry_2(D)]+0 S16 A32]))) >>> >>> satisfies the 'm" constraint. I don't think LRA should call >>> ix86_legitimate_combined_insn to validate to validate constraints on >>> an instruction. >> >>Hm... >> >>if LRA desn't assume that generic "m" constraint implies at least >>natural alignment of propageted operand, then your patch is the way to >>go. > > I don't think it even considers alignment. Archs where alignment validity > depends on the actual instruction should model this with proper constraints. > > But in this case, *every* SSE vector memory constraint should be >>changed to Bm. > > I'd say so ...
The "Bm" constraint should be applied only to non-move SSE instructions with 16-byte memory operand. -- H.J.