On Thu, Apr 06, 2023 at 12:51:20PM +0200, Eric Botcazou wrote: > > If we want to fix it in the combiner, I think the fix would be following. > > The optimization is about > > (and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c)) > > and IMHO we can only optimize it into > > (subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0) > > if we know that the upper bits of the REG are zeros. > > The reasoning is that, for WORD_REGISTER_OPERATIONS, the subword AND > operation > is done on the full word register, in other words that it's in effect: > > (subreg:SI (and:SI (reg:SI xxx) (const_int 0x84c)) 0) > > that is equivalent to the initial RTL so correct for WORD_REGISTER_OPERATIONS.
If the (and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c)) to (subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0) transformation is kosher for WORD_REGISTER_OPERATIONS, then I guess the invalid operation is then in simplify_context::simplify_binary_operation_1 case AND: ... if (HWI_COMPUTABLE_MODE_P (mode)) { HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, mode); HOST_WIDE_INT nzop1; if (CONST_INT_P (trueop1)) { HOST_WIDE_INT val1 = INTVAL (trueop1); /* If we are turning off bits already known off in OP0, we need not do an AND. */ if ((nzop0 & ~val1) == 0) return op0; } We have there op0==trueop0 (reg:HI 175) and op1==trueop1 (const_int 2124 [0x84c]). We then for integral? modes smaller than word_mode would then need to actually check nonzero_bits in the word_mode (on paradoxical subreg of trueop0?). If INTVAL (trueop1) is >= 0, then I think just doing nonzero_bits in the wider mode would be all we need (although the subsequent (nzop1 & nzop0) == 0 case probably wants to have the current nonzero_bits calls), not really sure what for WORD_REGISTER_OPERATIONS means AND with a constant which has the most significant bit set for the upper bits. So, perhaps just in the return op0; case add further code for WORD_REGISTER_OPERATIONS and sub-word modes which will call nonzero_bits again for the word mode and decide if it is still safe. > > Now, this patch fixes the PR, but certainly generates worse (but correct) > > code than the dse.cc patch. So perhaps we want both of them? > > What happens if you disable the step I mentioned (patchlet attached)? That patch doesn't change anything at all on the testcase, it is still miscompiled. Jakub