https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ams at gcc dot gnu.org --- Comment #35 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Richard Sandiford from comment #34) > (In reply to Richard Biener from comment #32) > > Btw, AVX512 knotb will invert all 8 bits and there's no knot just affecting > > the lowest 4 or 2 bits. > > > > It all feels like desaster waiting to happen ;) > Yes :) > > > For example BIT_NOT_EXPR is RTL expanded like > > > > case BIT_NOT_EXPR: > > op0 = expand_expr (treeop0, subtarget, > > VOIDmode, EXPAND_NORMAL); > > if (modifier == EXPAND_STACK_PARM) > > target = 0; > > /* In case we have to reduce the result to bitfield precision > > for unsigned bitfield expand this as XOR with a proper constant > > instead. */ > > if (reduce_bit_field && TYPE_UNSIGNED (type)) > > { > > int_mode = SCALAR_INT_TYPE_MODE (type); > > wide_int mask = wi::mask (TYPE_PRECISION (type), > > false, GET_MODE_PRECISION (int_mode)); > > > > temp = expand_binop (int_mode, xor_optab, op0, > > immed_wide_int_const (mask, int_mode), > > target, 1, OPTAB_LIB_WIDEN); > > > > so we could, for VECTOR_BOOLEAN_TYPE_P with integer mode and > > effective bit-precision set reduce_bit_field and fixup the fallout > > (not sure why the above is only for TYPE_UNSIGNED). > > > > At least it feels similar and doing things the opposite for vectors > > (fixing up at uses) would be odd? > Do you know why we take this approach for integers? Is it for > correctness? Or is it supposed to be more optimal? It's done for correctness. The main thing was bitfields > int which end up as bit-precision types in GIMPLE. > I can imagine that, for arithmetic types, there are going to many > more instances where upper bits matter (division, right shifts, > MIN/MAX, etc.). So perhaps reducing every result is a good > trade-off there. I think it was the easiest place to fix up and to make sure later RTL opts and backends do not interfere. > But there's an argument that it should be rare for the padding > bits in a vector to matter, since very few things would look at the > padding bits anyway. So perhaps the cost should be borne by the > operations that need canonical integers. But what happens when an operation not needing canonical intergers is transformed, say by combine or simplify-rtx to one that needs? I think the reduce_bitfield code was trying to be safe here. Actual define_insns not needing canonical padding could do like we now require scalar shifts - they could match a variant with the mask canonicalization op and hope for combine eliminating that. Of course that will explode in case it's the majority of cases ... > Not a strong opinion though, more just devil's advocate. Yeah. The important thing seems to be that the info this is a bit-precision QImode isn't lost ... which points to that using plain integer modes was a very bad choice ... > There again, if e.g. the x86 API guarantees memcmp equality between > two masks whose significant bits are equal, then we probably have > no choice. That's a good question, possibly most relevant for the OpenMP SIMD ABI. The AVX512 APIs use integer types everywhere, there's no intrinsic for ktest itself, but _mm512_kortestz and kortestc and also _mm512_knot (oddly only for HImode). So it at least seems to - from a quick look - be "broken" in the intrinsic API as well. At least we have _mm_cmp_sd_mask producing a QImode mask, _mm512_knot inverting too many bits. So the user must be aware of the "padding". One could argue it's the vectorizers job to fixup then, but of course it doesn't get to see the correct vector types here. Given we have the fixup in CTOR expansion the issue at hand could be fixed by mirroring that in VECTOR_CST expansion for now. Andrew - I suppose GCN also has cbranch, can you try to check what happens for < 8 lane vector modes there?