https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92772

--- Comment #6 from Andrew Stubbs <ams at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #4)
> Btw, isn't the issue that the reduction looks at all lanes?  That is,
> I think the code simply assumes that for fully masked loops at least
> one iteration is performed with all lanes active.  So if you bump
> N to 64 + 32 the test passes on amdgcn?

Yes, only the loads are masked. For most things this works fine, but not for
reductions or permutations, etc.

If I set N=64, and double the input array, then the test passes indeed.

Masking the load of the {1, 2, 3 .. 63} vector would solve the issue, as would
masking the comparison or the reduction (not that there's an optab for that).

Reply via email to