On Fri, 20 Oct 2023, Andrew Stubbs wrote: > This patch fixes a wrong-code bug on amdgcn in which the excess "ones" in the > mask enable extra lanes that were supposed to be unused and are therefore > undefined. > > Richi suggested an alternative approach involving narrower types and then a > zero-extend to the actual mask type. This solved the problem for the specific > test case that I had, but I'm not sure if it would work with V2 and V4 modes > (not that I've observed bad behaviour from them anyway, but still). There > were some other caveats involving "two-lane division" that I don't fully > understand, so I went with the simpler implementation. > > This patch does have the disadvantage of an additional "and" instruction in > the non-constant case even for machines that don't need it. I'm not sure how > to fix that without an additional target hook. (If GCC could use the 64-lane > vectors more effectively without the assistance of artificially reduced sizes > then this problem wouldn't exist.) > > OK to commit?
- convert_move (target, op0, 0); + rtx tmp = gen_reg_rtx (mode); + convert_move (tmp, op0, 0); + + if (known_ne (TYPE_VECTOR_SUBPARTS (type), + GET_MODE_PRECISION (mode))) Usually this would be maybe_ne, but then ... + { + /* Ensure no excess bits are set. + GCN needs this, AVX does not. */ + expand_binop (mode, and_optab, tmp, + GEN_INT ((1 << (TYPE_VECTOR_SUBPARTS (type) + .to_constant())) - 1), + target, true, OPTAB_DIRECT); here you have .to_constant (). I think with having an integer mode we know subparts is constant so I'd prefer auto nunits = TYPE_VECTOR_SUBPARTS (type).to_constant (); if (maybe_ne (GET_MODE_PRECISION (mode), nunits) ... + } + else + emit_move_insn (target, tmp); note you need the emit_move_insn also for the expand_binop path since it's not guaranteed that 'target' is used there. Thus tmp = expand_binop (...) if (tmp != target) emit_move_insn (...) Otherwise looks good to me. The and is needed on x86 for two and four bit masks, it would be more efficient to use smaller modes for the sign-extension I guess. Thanks, Richard.