https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119707

--- Comment #5 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <ja...@gcc.gnu.org>:

https://gcc.gnu.org/g:b57d7ef4bdda8f939d804bfe40123cb9e4b447b3

commit r15-9376-gb57d7ef4bdda8f939d804bfe40123cb9e4b447b3
Author: Jakub Jelinek <ja...@redhat.com>
Date:   Fri Apr 11 08:27:55 2025 +0200

    bitintlower: Fix up handling of nested casts in m_upward_2limbs cases
[PR119707]

    The following testcase is miscompiled I believe starting with
    PR112941 r14-6742.  That commit fixed the bitint-55.c testcase.
    The m_first initialization for such conversion initializes 2 SSA_NAMEs,
    one is PHI result on the loop (m_data[save_data_cnt]) and the other
    (m_data[save_data_cnt+1]) is the argument of that PHI from the latch
    edge initialized somewhere in the loop.  Both of these are used to
    propagate sign extension (i.e. either 0 or all ones limb) from the
    iteration with the sign bit of a narrower type to following iterations.
    The bitint-55.c testcase was ICEing with invalid SSA forms as it was
    using unconditionally the PHI argument SSA_NAME even in places which
    weren't dominated by that.  And the code which was touched is about
    handling constant idx, so if e.g. there are nested casts and the
    outer one does conditional code based on index comparison with
    a particular constant index.
    In the following testcase there are 2 nested casts, one from signed
    _BitInt(129) to unsigned _BitInt(255) and the outer from unsigned
    _BitInt(255) to unsigned _BitInt(256).  The m_upward_2limbs case which
    is used for handling mergeable arithmetics (like +-|&^ and casts etc.)
    one loop iteration handles 2 limbs, the first half the even ones, the
    second half the odd ones.
    And for these 2 conversions, the special one for the inner conversion
    on x86_64 is with index 2 where the sign bit of _BitInt(129) is present,
    while for the outer one index 3 where we need to mask off the most
    significant bit.
    The r15-6742 change started using m_data[save_data_cnt] for all constant
    indexes if it is still inside of the loop (and it is sign extension).
    But that doesn't work correctly for the case where the inner conversion
    produces the sign extension limb in the loop for an even index and
    the outer conversion needs to special case the immediately next conversion,
    because in that case using the PHI result will see still 0 there rather
    than the updated value from the handling of previous limb.
    So the following patch special cases this and uses the other SSA_NAME.

    Commented IL, trying to lower
      _1 = (unsigned _BitInt(255)) y_4(D);
      _2 = (unsigned _BitInt(256)) _1;
      _3 = _2 + x_5(D);
      <retval> = _3;
    we were emitting
      <bb 3> [local count: 1073741824]:
      # _8 = PHI <0(2), _9(12)>     // This is the limb index
      # _10 = PHI <0(2), _11(12)>   // Sign extension limb from inner cast (0
or ~0UL)
      # _22 = PHI <0(2), _23(12)>   // Overflow bit from addition of previous
limb
      if (_8 <= 2)
        goto <bb 4>; [80.00%]
      else
        goto <bb 7>; [20.00%]

      <bb 4> [local count: 1073741824]:
      if (_8 == 2)
        goto <bb 6>; [20.00%]
      else
        goto <bb 5>; [80.00%]

      <bb 5> [local count: 1073741824]:
      _12 = VIEW_CONVERT_EXPR<unsigned long[3]>(y)[_8];     // Full limbs in y
      goto <bb 7>; [100.00%]

      <bb 6> [local count: 214748360]:
      _13 = MEM <unsigned long> [(_BitInt(129) *)&y + 16B]; // y[2] which
      _14 = (<unnamed-signed:1>) _13;                       // needs to be
      _15 = (unsigned long) _14;                            // sign extended
      _16 = (signed long) _15;                              // to full
      _17 = _16 >> 63;                                      // limb
      _18 = (unsigned long) _17;

      <bb 7> [local count: 1073741824]:
      # _19 = PHI <_12(5), _10(3), _15(6)>  // Limb to add for result of casts
      # _20 = PHI <0(5), _10(3), _18(6)>    // Sign extension limb from
previous limb
      _11 = _20;                            // PHI _10 argument above
      _21 = VIEW_CONVERT_EXPR<unsigned long[4]>(x)[_8];
      _24 = .UADDC (_19, _21, _22);
      _25 = IMAGPART_EXPR <_24>;
      _26 = REALPART_EXPR <_24>;
      VIEW_CONVERT_EXPR<unsigned long[4]>(<retval>)[_8] = _26;
      _27 = _8 + 1;
      if (_27 == 3)                 // For the outer cast limb 3 is special
        goto <bb 11>; [20.00%]
      else
        goto <bb 8>; [80.00%]

      <bb 8> [local count: 1073741824]:
      if (_27 < 2)
        goto <bb 9>; [80.00%]
      else
        goto <bb 10>; [20.00%]

      <bb 9> [local count: 1073741824]:
      _28 = VIEW_CONVERT_EXPR<unsigned long[3]>(y)[_27];    // These are used
in full

      <bb 10> [local count: 1073741824]:
      # _29 = PHI <_28(9), _11(8)>
      goto <bb 12>; [100.00%]

      <bb 11> [local count: 214748360]:
    // And HERE is the actual bug.  Using _10 for idx 3 will mean it is always
    // zero there and doesn't contain the _18 value propagated to it.
    // It should be
    // _30 = (<unnamed-unsigned:63>) _11;
    // Now if the outer conversion had special iteration say 5, we could
    // have used _10 fine here, by that time it already propagates through
    // the PHI.
      _30 = (<unnamed-unsigned:63>) _10;
      _31 = (unsigned long) _30;

      <bb 12> [local count: 1073741824]:
      # _32 = PHI <_29(10), _31(11)>
      _33 = VIEW_CONVERT_EXPR<unsigned long[4]>(x)[_27];
      _34 = .UADDC (_32, _33, _25);
      _23 = IMAGPART_EXPR <_34>;
      _35 = REALPART_EXPR <_34>;
      VIEW_CONVERT_EXPR<unsigned long[4]>(<retval>)[_27] = _35;
      _9 = _8 + 2;
      if (_9 != 4)
        goto <bb 3>; [0.05%]
      else
        goto <bb 13>; [99.95%]

    2025-04-11  Jakub Jelinek  <ja...@redhat.com>

            PR tree-optimization/119707
            * gimple-lower-bitint.cc (bitint_large_huge::handle_cast): Only use
            m_data[save_data_cnt] instead of m_data[save_data_cnt + 1] if
            idx is odd and equal to low + 1.  Remember tree_to_uhwi (idx) in
            a temporary instead of calling the function multiple times.

            * gcc.dg/torture/bitint-76.c: New test.

Reply via email to