https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92772
--- Comment #6 from Andrew Stubbs <ams at gcc dot gnu.org> --- (In reply to Richard Biener from comment #4) > Btw, isn't the issue that the reduction looks at all lanes? That is, > I think the code simply assumes that for fully masked loops at least > one iteration is performed with all lanes active. So if you bump > N to 64 + 32 the test passes on amdgcn? Yes, only the loads are masked. For most things this works fine, but not for reductions or permutations, etc. If I set N=64, and double the input array, then the test passes indeed. Masking the load of the {1, 2, 3 .. 63} vector would solve the issue, as would masking the comparison or the reduction (not that there's an optab for that).