On 11/28/23 00:50, Richard Biener wrote:


There's no way to distinguish a partial vs. non-partial MEM on RTL and
while without the bogus MEM_ATTR the alias oracle pieces that
miscompiled the original case are fended off we still see the load/store
as full given they have a mode with a size - that for example means
that DSE can elide a previous store to a masked part.  Eventually
that's fended off by using an UNSPEC, but whether the RTL IL has
the correct semantics is questionable.

That said, I did propose scrapping the MEM_EXPR which I think is
the correct thing to do unless we want to put a CALL_EXPR into it
(nothing would use that at the moment) or re-do MEM_EXPR and instead
have an ao_ref (or sth slightly more complete) instead of the current
MEM_ATTRs - but that would be a lot of work.

This leaves the question wrt. semantics of for example x86 mask_store:

(insn 23 22 24 5 (set (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ])
                 (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) double>
[(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64])
         (unspec:V4DF [
                 (reg:V4DI 104 [ mask__16.8 ])
                 (reg:V4DF 105 [ vect_cst__42 ])
                 (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ])
                         (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4)
double> [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64])
             ] UNSPEC_MASKMOV)) "t.c":5:12 8523 {avx_maskstorepd256}
      (nil))

it uses a read-modify-write which makes it safe for DSE.
Agreed.


  mask_load
looks like

(insn 28 27 29 6 (set (reg:V4DF 115 [ vect__7.11 ])
         (unspec:V4DF [
                 (reg:V4DI 114 [ mask__8.8 ])
                 (mem:V4DF (plus:DI (reg/v/f:DI 118 [ val ])
                         (reg:DI 103 [ ivtmp.29 ])) [2 MEM <vector(4)
double> [(double *)val_13(D) + ivtmp.29_22 * 1]+0 S32 A64])
             ] UNSPEC_MASKMOV)) "t.c":5:17 8515 {avx_maskloadpd256}
      (nil))
So with the mem:V4DF inside the unspec, ISTM we must treat that as a potential full read, but we can't rely on it being a full read. I don't think UNSPEC semantics are that it must read/consume all its operands in full, just that it might. That might be worth a documentation clarification.



both have (as operand of the UNSPEC) a MEM with V4DFmode (and a
MEM_EXPR with a similarly bougs MEM_EXPR) indicating the loads
are _not_ partial.  That means the disambiguation against a store
to an object that's smaller than V4DF is still possible.
Setting MEM_SIZE to UNKNOWN doesn't help - that just asks to look
at the mode.  As discussed using a BLKmode MEM _might_ be a way
out but I didn't try what will happen then (patterns would need to
be adjusted I guess).

That said, I'm happy to commit the partial fix, scrapping the
bogus MEM_EXPRs.

OK for that?
Works for me.
jeff

Reply via email to