On Thu, Apr 06, 2023 at 12:51:20PM +0200, Eric Botcazou wrote:
> > If we want to fix it in the combiner, I think the fix would be following.
> > The optimization is about
> > (and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
> > and IMHO we can only optimize it into
> > (subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
> > if we know that the upper bits of the REG are zeros.
>
> The reasoning is that, for WORD_REGISTER_OPERATIONS, the subword AND
> operation
> is done on the full word register, in other words that it's in effect:
>
> (subreg:SI (and:SI (reg:SI xxx) (const_int 0x84c)) 0)
>
> that is equivalent to the initial RTL so correct for WORD_REGISTER_OPERATIONS.
If the
(and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
to
(subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
transformation is kosher for WORD_REGISTER_OPERATIONS, then I guess the
invalid operation is then in
simplify_context::simplify_binary_operation_1
case AND:
...
if (HWI_COMPUTABLE_MODE_P (mode))
{
HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, mode);
HOST_WIDE_INT nzop1;
if (CONST_INT_P (trueop1))
{
HOST_WIDE_INT val1 = INTVAL (trueop1);
/* If we are turning off bits already known off in OP0, we need
not do an AND. */
if ((nzop0 & ~val1) == 0)
return op0;
}
We have there op0==trueop0 (reg:HI 175) and op1==trueop1 (const_int 2124
[0x84c]).
We then for integral? modes smaller than word_mode would then need to
actually check nonzero_bits in the word_mode (on paradoxical subreg of
trueop0?). If INTVAL (trueop1) is >= 0, then I think just doing
nonzero_bits in the wider mode would be all we need (although the
subsequent (nzop1 & nzop0) == 0 case probably wants to have the current
nonzero_bits calls), not really sure what for WORD_REGISTER_OPERATIONS
means AND with a constant which has the most significant bit set for the
upper bits.
So, perhaps just in the return op0; case add further code for
WORD_REGISTER_OPERATIONS and sub-word modes which will call nonzero_bits
again for the word mode and decide if it is still safe.
> > Now, this patch fixes the PR, but certainly generates worse (but correct)
> > code than the dse.cc patch. So perhaps we want both of them?
>
> What happens if you disable the step I mentioned (patchlet attached)?
That patch doesn't change anything at all on the testcase, it is still
miscompiled.
Jakub