https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121306

--- Comment #14 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Richard Sandiford <rsand...@gcc.gnu.org>:

https://gcc.gnu.org/g:3e6e885beb7097c5c5ee2c48ddb3b0e61f3a1fc7

commit r16-3124-g3e6e885beb7097c5c5ee2c48ddb3b0e61f3a1fc7
Author: Richard Sandiford <richard.sandif...@arm.com>
Date:   Mon Aug 11 09:24:10 2025 +0100

    simplify-rtx: Distribute some non-narrowing subregs [PR121306]

    In g:965564eafb721f8000013a3112f1bba8d8fae32b I'd added code
    to try distributing non-widening subregs through logic ops,
    in cases where that would eliminate a term of the logic op.

    For "reasons", this indirectly caused combine to generate:

      (set (zero_extract:SI (reg/v:SI 101 [ a ])
              (const_int 8 [0x8])
              (const_int 8 [0x8]))
          (not:SI (sign_extract:SI (reg:SI 107 [ b ])
                  (const_int 8 [0x8])
                  (const_int 8 [0x8]))))

    instead of:

      (set (zero_extract:SI (reg/v:SI 101 [ a ])
              (const_int 8 [0x8])
              (const_int 8 [0x8]))
          (subreg:SI (not:QI (subreg:QI (sign_extract:SI (reg:SI 107 [ b ])
                          (const_int 8 [0x8])
                          (const_int 8 [0x8])) 0)) 0))

    for some tests that were intended to match x86's *one_cmplqi_ext<mode>_1
    (see g:a58d770fa1d17ead3c38417b299cce3f19f392db).  However, other more
    direct ways of generating the pattern continued to have the unsimplified
    (subreg:SI (not:QI (subreg:QI (...:SI ...)))) structure, since that
    structure wasn't the focus of the original patch.

    This patch tries to tackle that simplification head-on.  It's another
    case of distributing subregs, but this time for non-narrowing rather
    than non-widening subregs.  We already do the same distribution for
    word_mode:

      /* Attempt to simplify WORD_MODE SUBREGs of bitwise expressions.  */
      if (outermode == word_mode
          && (GET_CODE (op) == IOR || GET_CODE (op) == XOR || GET_CODE (op) ==
AND)
          && SCALAR_INT_MODE_P (innermode))
        {
          rtx op0 = simplify_subreg (outermode, XEXP (op, 0), innermode, byte);
          rtx op1 = simplify_subreg (outermode, XEXP (op, 1), innermode, byte);
          if (op0 && op1)
            return simplify_gen_binary (GET_CODE (op), outermode, op0, op1);
        }

    which g:0340177d54d08b6375391ba164a878e6a596275e extended to NOT.
    For word_mode, there are (reasonably) no restrictions on the inner
    mode other than that it is an integer.  Doing word_mode logic ops
    should be at least as efficient as subword logic ops (if the target
    provides subword ops at all).  And word_mode logic ops should be
    cheaper than multi-word logic ops.

    But here we need the distribution for SImode rather than word_mode
    (DImode).  The patch therefore extends the word_mode distributions
    to non-narrowing subregs in which the two modes occupy the same
    number of words.  This should hopefully be relatively conservative.
    It prevents the new rule from going away from word_mode, and attempting
    to convert (say) a QImode subreg of a word_mode AND into a QImode AND.
    It should be suitable for both CISCy and RISCy targets, including
    those that define WORD_REGISTER_OPERATIONS.

    The patch also fixes some overlong lines in related code.

    gcc/
            PR rtl-optimization/121306
            * simplify-rtx.cc (simplify_context::simplify_subreg): Distribute
            non-narrowing integer-to-integer subregs through logic ops,
            in a similar way to the existing word_mode handling.

Reply via email to