Robin Dapp <[email protected]> writes:
> Hi,
>
> in PR115336 we have the following
>
> vect_patt_391 = .MASK_LEN_GATHER_LOAD (_470, vect__59, 1, { 0, ... }, { 0,
> ... }, _482, 0);
> vect_iftmp.44 = vect_patt_391 | { 1, ... };
> .MASK_LEN_STORE (vectp_f.45, 8B, { -1, ... }, _482, 0, vect_iftmp.44);
>
> which assumes that a maskload sets the masked-out elements to zero.
> RVV does not guarantee this and, unfortunately, leaves it to the
> hardware implementation to do basically anything it wants (even keep
> the previous value). In the PR this leads to us reusing a previous
> vector register and stale, nonzero values causing an invalid result.
>
> Based on a Richard Sandiford's feedback I started with the attached
> patch - it's more an RFC in its current shape and obviously not
> tested exhaustively.
>
> The idea is:
> - In ifcvt emit a VCOND_MASK (mask, load_result, preferred_else_value)
> after a MASK_LOAD if the target requires it.
> - Elide the VCOND_MASK when there is a COND_OP with a replacing else
> value later.
>
> Is this, generally, reasonable or is there a better way?
>
> My initial idea was to add an else value to MASK_LOAD. Richard's
> concern was, though, that this might make nonzero else values
> appear inexpensive when they are actually not.
Yeah, but like I said off-list, I still think an else operand is the way
to go. It would be quite a bit of work to retrofit at this point, but
it feels like the right long-term solution.
FTR, my concern & suggestion was:
I suppose the difficulty is that we might make:
MASK_LOAD (mask, ptr, some-arbitrary-else-value)
seem as cheap as:
MASK_LOAD (mask, ptr, { 0, 0,. ... 0})
which definitely isn't the case for SVE (and I'm guessing also
for AVX). It would be better to keep the:
COND_EXPR (mask, ..., some-arbitrary-else-value)
separate and try to optimise it with surrounding code.
Perhaps we could use the optab's insn predicates to test whether
a given else value is acceptable? We already do something similar
for the gather load arguments.
Since then, there's been discussion about taking the same
predicated-based approach for vec_cmp.
That is, we'd add the else value to the maskload optab and MASK_LOAD
internal function, but use predicates to test whether a target supports
a particular else value.
Thanks,
Richard