On Wed, 21 Aug 2024, Robin Dapp wrote: > > > > Why? I don't think the vectorizer relies on a particular else > > > > value? I'd say it would be appropriate for if-conversion to > > > > use "ANY" and for the vectorizer to then pick a supported > > > > version and/or enforce the else value it needs via a blend? > > > > > > In PR115336 we have something like > > > > > > _Bool iftmp.0_113; > > > _Bool iftmp.0_114; > > > iftmp.0_113 = .MASK_LOAD (_170, 8B, _169, _171(D)); > > > iftmp.0_114 = _47 | iftmp.0_113; > > > > > > which assumes zeroing. > > > > I see - is that some trick ifcvt performs? I can't immediately > > see the connection to the PR and it only contains RISC-V assembly > > analysis. > > It happens in predicate_scalar_phi where we build the cond_expr. > > After converting > > iftmp.0_114 = PHI <iftmp.0_113(45), 1(38)> > > we have > > _BoolD.2746 _47; > iftmp.0_114 = _47 ? 1 : iftmp.0_113; > which is folded into > iftmp.0_114 = _47 | iftmp.0_113; > > I should really have documented that and more in the PR already... > So it's not an ifcvt trick but rather match.
_47 was the .MASK_LOAD def, right? It's not exactly obvious what goes wrong - the transform above is correct - it's only "unexpected" for the lanes that were masked. So the actual bug must be downstream from iftmp.0_144. I think one can try to reason on the ifcvt (scalar) code by assuming the .MASK_LOAD def would be undefined. Then we'd have _47(D) ? 1 : iftmp.0_133 -> _47(D) | iftmp.0_133, I think that's at most phishy as the COND_EXPR has a well-defined value while the IOR might spill "undefined" elsewhere causing divergence. Is that what is actually happening? Richard. > Another related case is PR116059. > > > > In order to circumvent that we could use COND_IOR > > > but I suppose that wouldn't be optimized away even on targets that zero > > > masked elements? "ANY" would seem to be wrong here. > > > > What I was trying to say is that of course any transform we perform > > that requires zero-masking should either make .MAKS_LOAD perform > > that or add a COND_EXPR. But it shouldn't be required to make > > all .MASK_LOADs be zero-masked, no? > > > > I'm of course fine if you think that's the best way for RISC-V given > > other targets are likely unaffected as they can perform zero-masking. > > No, the less zeroing the better of course :) > > Richard S's point was to make the COND_EXPR explicit, so that e.g. > a MASK_LOAD (mask, ..., 1) does not appear cheap as cheap as > MASK_LOAD (mask, ..., 0) on zeroing targets. > > From this I kind of jumped to the conclusion (see below) that we might want > it everywhere. > > With the patches as is, ifcvt would enforce the zero here while all other > masked-load occurrences in the vectorizer would just query the target's > preferred else value and simply use that without blend/cond_expr. > > > > What I didn't do (in the posted version, just locally) is an explicit > > > VEC_COND_EXPR after each masked (gather/load lanes) call the vectorizer > > > does. Do we need that? AFAICT loop masking (be it len style or > > > fully-masked style) should be safe. > > > > Well, why should we need that? There seem to be the assumption that > > .MASK_LOAD is zero-masked in very few places (PR115336, but not > > identified there), if we'd assume this everywhere there would be > > way more issues with RISC-V? > > Ok, I was already pretty sure we don't need - and glad to hear it confirmed. > I was just thinking for consistency reasons we might want a masked > load to always look like > foo123 = .MASK_..._LOAD (mask, ..., else) > foo124 = COND_EXPR (mask, foo123, 0); > where foo124 would be optimized away (or not even emitted) for zeroing > targets). That way subsequent code could always rely on zero. > But as this requirement seems very rare it doesn't look like a useful > invariant to enforce. > > All in all, it seems we don't need major changes to the approach. > I'm going to work on the comments for the other patches. > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)