[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

rguenth at gcc dot gnu.org via Gcc-bugs Thu, 01 Feb 2024 02:04:59 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ams at gcc dot gnu.org

--- Comment #35 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Sandiford from comment #34)
> (In reply to Richard Biener from comment #32)
> > Btw, AVX512 knotb will invert all 8 bits and there's no knot just affecting
> > the lowest 4 or 2 bits.
> > 
> > It all feels like desaster waiting to happen ;)
> Yes :)
> 
> > For example BIT_NOT_EXPR is RTL expanded like
> > 
> >     case BIT_NOT_EXPR:
> >       op0 = expand_expr (treeop0, subtarget,
> >                          VOIDmode, EXPAND_NORMAL);
> >       if (modifier == EXPAND_STACK_PARM)
> >         target = 0;
> >       /* In case we have to reduce the result to bitfield precision
> >          for unsigned bitfield expand this as XOR with a proper constant
> >          instead.  */
> >       if (reduce_bit_field && TYPE_UNSIGNED (type))
> >         {
> >           int_mode = SCALAR_INT_TYPE_MODE (type);
> >           wide_int mask = wi::mask (TYPE_PRECISION (type),
> >                                     false, GET_MODE_PRECISION (int_mode));
> >   
> >           temp = expand_binop (int_mode, xor_optab, op0,
> >                                immed_wide_int_const (mask, int_mode),
> >                                target, 1, OPTAB_LIB_WIDEN);
> > 
> > so we could, for VECTOR_BOOLEAN_TYPE_P with integer mode and
> > effective bit-precision set reduce_bit_field and fixup the fallout
> > (not sure why the above is only for TYPE_UNSIGNED).
> >
> > At least it feels similar and doing things the opposite for vectors
> > (fixing up at uses) would be odd?
> Do you know why we take this approach for integers?  Is it for
> correctness?  Or is it supposed to be more optimal?

It's done for correctness.  The main thing was bitfields > int which
end up as bit-precision types in GIMPLE.

> I can imagine that, for arithmetic types, there are going to many
> more instances where upper bits matter (division, right shifts,
> MIN/MAX, etc.).  So perhaps reducing every result is a good
> trade-off there.

I think it was the easiest place to fix up and to make sure later
RTL opts and backends do not interfere.

> But there's an argument that it should be rare for the padding
> bits in a vector to matter, since very few things would look at the
> padding bits anyway.  So perhaps the cost should be borne by the
> operations that need canonical integers.

But what happens when an operation not needing canonical intergers
is transformed, say by combine or simplify-rtx to one that needs?
I think the reduce_bitfield code was trying to be safe here.

Actual define_insns not needing canonical padding could do like
we now require scalar shifts - they could match a variant with
the mask canonicalization op and hope for combine eliminating that.
Of course that will explode in case it's the majority of cases ...

> Not a strong opinion though, more just devil's advocate.

Yeah.  The important thing seems to be that the info this is a
bit-precision QImode isn't lost ... which points to that using
plain integer modes was a very bad choice ...

> There again, if e.g. the x86 API guarantees memcmp equality between
> two masks whose significant bits are equal, then we probably have
> no choice.

That's a good question, possibly most relevant for the OpenMP SIMD ABI.

The AVX512 APIs use integer types everywhere, there's no intrinsic
for ktest itself, but _mm512_kortestz and kortestc and also _mm512_knot
(oddly only for HImode).  So it at least seems to - from a quick look -
be "broken" in the intrinsic API as well.  At least we have
_mm_cmp_sd_mask producing a QImode mask, _mm512_knot inverting too
many bits.  So the user must be aware of the "padding".

One could argue it's the vectorizers job to fixup then, but of course
it doesn't get to see the correct vector types here.

Given we have the fixup in CTOR expansion the issue at hand could be
fixed by mirroring that in VECTOR_CST expansion for now.

Andrew - I suppose GCN also has cbranch, can you try to check what happens
for < 8 lane vector modes there?

[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

Reply via email to