https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107096
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Andrew Stubbs from comment #4) > I don't understand rgroups, but I can say that GCN masks are very simply > one-bit-one-lane. There are always 64-lanes, regardless of the type, so > V64QI mode has fewer bytes and bits than V64DImode (when written to memory). > > This is different to most other architectures where the bit-size remains the > same and number of lanes varies with the inner type, and has caused us some > issues with invalid assumptions in GCC (e.g. "there's no need for > sign-extends in vector registers" is not true for GCN). > > However, I think it's the same as you're describing for AVX512, at least in > this respect. > > Incidentally I'm on the cusp of adding multiple "virtual" vector sizes in > the GCN backend (in lieu of implementing full mask support everywhere in the > middle-end and fixing all the cost assumptions), so these VIEW_CONVERT_EXPR > issues are getting worse. I have a bunch of vec_extract patterns that fix up > some of it. Within the backed, the V32, V16, V8, V4 and V2 vectors are all > really just 64-lane vectors with the mask preset, so the mask has to remain > DImode or register allocation becomes tricky. For the documentation test case GCN seems to skirt the issue by using different "sized" vectors so it manages to get two different nV here, one for the float and one for the double rgroup. With AVX512 I get the same nV and wrong code, re-using the mask of the floats: _51 = VIEW_CONVERT_EXPR<vector(4) <signed-boolean:1>>(loop_mask_40); (but the verifier not ICEing because it only checks modes). GCN gets nV == 1 for both for void foo (float *f, double * __restrict d, int n) { for (int i = 0; i < n; ++i) { f[i] += 1.0f; d[i] += 3.0; } } and here sharing the mask is OK. So it looks like the sharing logic depends on how we get to assign vector modes - GCN insists on handing out only 64 lane vectors. If you'd change that and allow mixing I guess you'll run into similar issues as AVX512. Only handing out 8-lane vectors would limit AVX512 quite a bit so that doesn't sound like a viable option to us. RVV might be in the same situation as GCN here. For GCN with DImode mask vectors at the point where V_C_Es would be emitted we could assert that the number of lanes are the same (we probably should, otherwise we'd have wrong-code). So eventually special-casing integer mode masks might work out.