On Mon, Jan 6, 2025 at 2:12 PM Richard Sandiford
<[email protected]> wrote:
>
> g:d882fe5150fbbeb4e44d007bb4964e5b22373021, posted at
> https://gcc.gnu.org/pipermail/gcc-patches/2000-July/033786.html ,
> added code to treat:
>
> (set (reg:CC cc) (compare:CC (gt:M (reg:CC cc) 0) (lt:M (reg:CC cc) 0)))
>
> as a nop. This PR shows that that isn't always correct.
> The compare in the set above is between two 0/1 booleans (at least
> on STORE_FLAG_VALUE==1 targets), whereas the unknown comparison that
> produced the incoming (reg:CC cc) is unconstrained; it could be between
> arbitrary integers, or even floats. The fold is therefore replacing a
> cc that is valid for both signed and unsigned comparisons with one that
> is only known to be valid for signed comparisons.
>
> (gt (compare (gt cc 0) (lt cc 0) 0)
>
> does simplify to:
>
> (gt cc 0)
>
> but:
>
> (gtu (compare (gt cc 0) (lt cc 0) 0)
>
> does not simplify to:
>
> (gtu cc 0)
>
> The optimisation didn't come with a testcase, but it was added for
> i386's cmpstrsi, now cmpstrnsi. That probably doesn't matter as much
> as it once did, since it's now conditional on -minline-all-stringops.
> But the patch is almost 25 years old, so whatever the original
> motivation was, it seems likely that other things now rely on it.
>
> It therefore seems better to try to preserve the optimisation on rtl
> rather than get rid of it. To do that, we need to look at how the
> result of the outer compare is used. We'd therefore be looking at four
> instructions (the gt, the lt, the compare, and the use of the compare),
> but combine already allows that for 3-instruction combinations thanks
> to:
>
> /* If the source is a COMPARE, look for the use of the comparison result
> and try to simplify it unless we already have used undobuf.other_insn.
> */
>
> When applied to boolean inputs, a comparison operator is
> effectively a boolean logical operator (AND, ANDNOT, XOR, etc.).
> simplify_logical_relational_operation already had code to simplify
> logical operators between two comparison results, but:
>
> * It only handled IOR, which doesn't cover all the cases needed here.
> The others are easily added.
>
> * It treated comparisons of integers as having an ORDERED/UNORDERED result.
> Therefore:
>
> * it would not treat "true for LT + EQ + GT" as "always true" for
> comparisons between integers, because the mask excluded the UNORDERED
> condition.
>
> * it would try to convert "true for LT + GT" into LTGT even for comparisons
> between integers. To prevent an ICE later, the code used:
>
> /* Many comparison codes are only valid for certain mode classes. */
> if (!comparison_code_valid_for_mode (code, mode))
> return 0;
>
> However, this used the wrong mode, since "mode" is here the integer
> result of the comparisons (and the mode of the IOR), not the mode of
> the things being compared. Thus the effect was to reject all
> floating-point-only codes, even when comparing floats.
>
> I think instead the code should detect whether the comparison is between
> integer values and remove UNORDERED from consideration if so. It then
> always produces a valid comparison (or an always true/false result),
> and so comparison_code_valid_for_mode is not needed. In particular,
> "true for LT + GT" becomes NE for comparisons between integers but
> remains LTGT for comparisons between floats.
>
> * There was a missing check for whether the comparison inputs had
> side effects.
>
> While there, it also seemed worth extending
> simplify_logical_relational_operation to unsigned comparisons, since
> that makes the testing easier.
>
> As far as that testing goes: the patch exhaustively tests all
> combinations of integer comparisons in:
>
> (cmp1 (cmp2 X Y) (cmp3 X Y))
>
> for the 10 integer comparisons, giving 1000 fold attempts in total.
> It then tries all combinations of (X in {-1,0,1} x Y in {-1,0,1})
> on the result of the fold, giving 9 checks per fold, or 9000 in total.
> That's probably more than is typical for self-tests, but it seems to
> complete in neglible time, even for -O0 builds.
>
> Tested on aarch64-linux-gnu & x86_64-linux-gnu. OK to install?
OK.
> The patch isn't exactly a spot fix, and the bug is ancient, so I suppose
> the patch probably isn't suitable for backports.
Maybe for GCC 14, but not without some soaking time of course.
Thanks,
Richard.
> Richard
>
>
> gcc/
> PR rtl-optimization/117186
> * rtl.h (simplify_context::simplify_logical_relational_operation): Add
> an invert0_p parameter.
> * simplify-rtx.cc (unsigned_comparison_to_mask): New function.
> (mask_to_unsigned_comparison): Likewise.
> (comparison_code_valid_for_mode): Delete.
> (simplify_context::simplify_logical_relational_operation): Add
> an invert0_p parameter. Handle AND and XOR. Handle unsigned
> comparisons. Handle always-false results. Ignore the low bit
> of the mask if the operands are always ordered and remove the
> then-redundant check of comparison_code_valid_for_mode. Check
> for side-effects in the operands before simplifying them away.
> (simplify_context::simplify_binary_operation_1): Remove
> simplification of (compare (gt ...) (lt ...)) and instead...
> (simplify_context::simplify_relational_operation_1): ...handle
> comparisons of comparisons here.
> (test_comparisons): New function.
> (test_scalar_ops): Call it.
>
> gcc/testsuite/
> PR rtl-optimization/117186
> * gcc.dg/torture/pr117186.c: New test.
> * gcc.target/aarch64/pr117186.c: Likewise.
> ---
> gcc/rtl.h | 3 +-
> gcc/simplify-rtx.cc | 276 ++++++++++++++------
> gcc/testsuite/gcc.dg/torture/pr117186.c | 15 ++
> gcc/testsuite/gcc.target/aarch64/pr117186.c | 128 +++++++++
> 4 files changed, 339 insertions(+), 83 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/torture/pr117186.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/pr117186.c
>
> diff --git a/gcc/rtl.h b/gcc/rtl.h
> index 3d211953b7d..3b676c46880 100644
> --- a/gcc/rtl.h
> +++ b/gcc/rtl.h
> @@ -3478,7 +3478,8 @@ private:
> rtx simplify_byte_swapping_operation (rtx_code, machine_mode, rtx, rtx);
> rtx simplify_associative_operation (rtx_code, machine_mode, rtx, rtx);
> rtx simplify_distributive_operation (rtx_code, machine_mode, rtx, rtx);
> - rtx simplify_logical_relational_operation (rtx_code, machine_mode, rtx,
> rtx);
> + rtx simplify_logical_relational_operation (rtx_code, machine_mode, rtx,
> rtx,
> + bool = false);
> rtx simplify_binary_operation_series (rtx_code, machine_mode, rtx, rtx);
> rtx simplify_distribute_over_subregs (rtx_code, machine_mode, rtx, rtx);
> rtx simplify_shift_const_int (rtx_code, machine_mode, rtx, unsigned int);
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index c77ce956332..71c5d3c1b1b 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -2446,6 +2446,61 @@ simplify_context::simplify_associative_operation
> (rtx_code code,
> return 0;
> }
>
> +/* If COMPARISON can be treated as an unsigned comparison, return a mask
> + that represents it (8 if it includes <, 4 if it includes > and 2
> + if it includes ==). Return 0 otherwise. */
> +static int
> +unsigned_comparison_to_mask (rtx_code comparison)
> +{
> + switch (comparison)
> + {
> + case LTU:
> + return 8;
> + case GTU:
> + return 4;
> + case EQ:
> + return 2;
> +
> + case LEU:
> + return 10;
> + case GEU:
> + return 6;
> +
> + case NE:
> + return 12;
> +
> + default:
> + return 0;
> + }
> +}
> +
> +/* Reverse the mapping in unsigned_comparison_to_mask, going from masks
> + to comparisons. */
> +static rtx_code
> +mask_to_unsigned_comparison (int mask)
> +{
> + switch (mask)
> + {
> + case 8:
> + return LTU;
> + case 4:
> + return GTU;
> + case 2:
> + return EQ;
> +
> + case 10:
> + return LEU;
> + case 6:
> + return GEU;
> +
> + case 12:
> + return NE;
> +
> + default:
> + gcc_unreachable ();
> + }
> +}
> +
> /* Return a mask describing the COMPARISON. */
> static int
> comparison_to_mask (enum rtx_code comparison)
> @@ -2530,53 +2585,6 @@ mask_to_comparison (int mask)
> }
> }
>
> -/* Return true if CODE is valid for comparisons of mode MODE, false
> - otherwise.
> -
> - It is always safe to return false, even if the code was valid for the
> - given mode as that will merely suppress optimizations. */
> -
> -static bool
> -comparison_code_valid_for_mode (enum rtx_code code, enum machine_mode mode)
> -{
> - switch (code)
> - {
> - /* These are valid for integral, floating and vector modes. */
> - case NE:
> - case EQ:
> - case GE:
> - case GT:
> - case LE:
> - case LT:
> - return (INTEGRAL_MODE_P (mode)
> - || FLOAT_MODE_P (mode)
> - || VECTOR_MODE_P (mode));
> -
> - /* These are valid for floating point modes. */
> - case LTGT:
> - case UNORDERED:
> - case ORDERED:
> - case UNEQ:
> - case UNGE:
> - case UNGT:
> - case UNLE:
> - case UNLT:
> - return FLOAT_MODE_P (mode);
> -
> - /* These are filtered out in simplify_logical_operation, but
> - we check for them too as a matter of safety. They are valid
> - for integral and vector modes. */
> - case GEU:
> - case GTU:
> - case LEU:
> - case LTU:
> - return INTEGRAL_MODE_P (mode) || VECTOR_MODE_P (mode);
> -
> - default:
> - gcc_unreachable ();
> - }
> -}
> -
> /* Canonicalize RES, a scalar const0_rtx/const_true_rtx to the right
> false/true value of comparison with MODE where comparison operands
> have CMP_MODE. */
> @@ -2625,17 +2633,16 @@ relational_result (machine_mode mode, machine_mode
> cmp_mode, rtx res)
> }
>
> /* Simplify a logical operation CODE with result mode MODE, operating on OP0
> - and OP1, which should be both relational operations. Return 0 if no such
> - simplification is possible. */
> + and OP1, in the case where both are relational operations. Assume that
> + OP0 is inverted if INVERT0_P is true.
> +
> + Return 0 if no such simplification is possible. */
> rtx
> simplify_context::simplify_logical_relational_operation (rtx_code code,
> machine_mode mode,
> - rtx op0, rtx op1)
> + rtx op0, rtx op1,
> + bool invert0_p)
> {
> - /* We only handle IOR of two relational operations. */
> - if (code != IOR)
> - return 0;
> -
> if (!(COMPARISON_P (op0) && COMPARISON_P (op1)))
> return 0;
>
> @@ -2643,28 +2650,64 @@
> simplify_context::simplify_logical_relational_operation (rtx_code code,
> && rtx_equal_p (XEXP (op0, 1), XEXP (op1, 1))))
> return 0;
>
> + if (side_effects_p (op0))
> + return 0;
> +
> enum rtx_code code0 = GET_CODE (op0);
> enum rtx_code code1 = GET_CODE (op1);
>
> - /* We don't handle unsigned comparisons currently. */
> - if (code0 == LTU || code0 == GTU || code0 == LEU || code0 == GEU)
> - return 0;
> - if (code1 == LTU || code1 == GTU || code1 == LEU || code1 == GEU)
> - return 0;
> + /* Assume at first that the comparisons are on integers, and that the
> + operands are therefore ordered. */
> + int all = 14;
> + int mask0 = unsigned_comparison_to_mask (code0);
> + int mask1 = unsigned_comparison_to_mask (code1);
> + bool unsigned_p = (IN_RANGE (mask0 & 12, 4, 8)
> + || IN_RANGE (mask1 & 12, 4, 8));
> + if (unsigned_p)
> + {
> + /* We only reach here when comparing integers. Reject mixtures of
> signed
> + and unsigned comparisons. */
> + if (mask0 == 0 || mask1 == 0)
> + return 0;
> + }
> + else
> + {
> + /* See whether the operands might be unordered. */
> + if (HONOR_NANS (GET_MODE (XEXP (op0, 0))))
> + all = 15;
> + mask0 = comparison_to_mask (code0) & all;
> + mask1 = comparison_to_mask (code1) & all;
> + }
>
> - int mask0 = comparison_to_mask (code0);
> - int mask1 = comparison_to_mask (code1);
> + if (invert0_p)
> + mask0 = mask0 ^ all;
>
> - int mask = mask0 | mask1;
> + int mask;
> + if (code == AND)
> + mask = mask0 & mask1;
> + else if (code == IOR)
> + mask = mask0 | mask1;
> + else if (code == XOR)
> + mask = mask0 ^ mask1;
> + else
> + return 0;
>
> - if (mask == 15)
> + if (mask == all)
> return relational_result (mode, GET_MODE (op0), const_true_rtx);
>
> - code = mask_to_comparison (mask);
> + if (mask == 0)
> + return relational_result (mode, GET_MODE (op0), const0_rtx);
>
> - /* Many comparison codes are only valid for certain mode classes. */
> - if (!comparison_code_valid_for_mode (code, mode))
> - return 0;
> + if (unsigned_p)
> + code = mask_to_unsigned_comparison (mask);
> + else
> + {
> + code = mask_to_comparison (mask);
> + /* LTGT and NE are arithmetically equivalent for ordered operands,
> + with NE being the canonical choice. */
> + if (code == LTGT && all == 14)
> + code = NE;
> + }
>
> op0 = XEXP (op1, 0);
> op1 = XEXP (op1, 1);
> @@ -3217,21 +3260,6 @@ simplify_context::simplify_binary_operation_1
> (rtx_code code,
> break;
>
> case COMPARE:
> - /* Convert (compare (gt (flags) 0) (lt (flags) 0)) to (flags). */
> - if (((GET_CODE (op0) == GT && GET_CODE (op1) == LT)
> - || (GET_CODE (op0) == GTU && GET_CODE (op1) == LTU))
> - && XEXP (op0, 1) == const0_rtx && XEXP (op1, 1) == const0_rtx)
> - {
> - rtx xop00 = XEXP (op0, 0);
> - rtx xop10 = XEXP (op1, 0);
> -
> - if (REG_P (xop00) && REG_P (xop10)
> - && REGNO (xop00) == REGNO (xop10)
> - && GET_MODE (xop00) == mode
> - && GET_MODE (xop10) == mode
> - && GET_MODE_CLASS (mode) == MODE_CC)
> - return xop00;
> - }
> break;
>
> case MINUS:
> @@ -6405,6 +6433,31 @@ simplify_context::simplify_relational_operation_1
> (rtx_code code,
> && INTVAL (XEXP (tmp, 1)) == GET_MODE_PRECISION (int_mode) - 1)
> return simplify_gen_binary (AND, mode, XEXP (tmp, 0), const1_rtx);
> }
> +
> + /* For two booleans A and B:
> +
> + A > B == ~B & A
> + A >= B == ~B | A
> + A < B == ~A & B
> + A <= B == ~A | B
> + A == B == ~A ^ B (== ~B ^ A)
> + A != B == A ^ B
> +
> + simplify_logical_relational_operation checks whether A and B
> + are booleans. */
> + if (code == GTU || code == GT)
> + return simplify_logical_relational_operation (AND, mode, op1, op0, true);
> + if (code == GEU || code == GE)
> + return simplify_logical_relational_operation (IOR, mode, op1, op0, true);
> + if (code == LTU || code == LT)
> + return simplify_logical_relational_operation (AND, mode, op0, op1, true);
> + if (code == LEU || code == LE)
> + return simplify_logical_relational_operation (IOR, mode, op0, op1, true);
> + if (code == EQ)
> + return simplify_logical_relational_operation (XOR, mode, op0, op1, true);
> + if (code == NE)
> + return simplify_logical_relational_operation (XOR, mode, op0, op1);
> +
> return NULL_RTX;
> }
>
> @@ -8493,6 +8546,63 @@ test_scalar_int_ext_ops2 (machine_mode bmode,
> machine_mode mmode,
> simplify_gen_unary (TRUNCATE, smode, mreg, mmode));
> }
>
> +/* Test comparisons of comparisons, with the inner comparisons being
> + between values of mode MODE2 and producing results of mode MODE1,
> + and with the outer comparisons producing results of mode MODE0. */
> +
> +static void
> +test_comparisons (machine_mode mode0, machine_mode mode1, machine_mode mode2)
> +{
> + rtx reg0 = make_test_reg (mode2);
> + rtx reg1 = make_test_reg (mode2);
> +
> + static const rtx_code codes[] = {
> + EQ, NE, LT, LTU, LE, LEU, GE, GEU, GT, GTU
> + };
> + constexpr auto num_codes = ARRAY_SIZE (codes);
> + rtx cmps[num_codes];
> + rtx vals[] = { constm1_rtx, const0_rtx, const1_rtx };
> +
> + for (unsigned int i = 0; i < num_codes; ++i)
> + cmps[i] = gen_rtx_fmt_ee (codes[i], mode1, reg0, reg1);
> +
> + for (auto code : codes)
> + for (unsigned int i0 = 0; i0 < num_codes; ++i0)
> + for (unsigned int i1 = 0; i1 < num_codes; ++i1)
> + {
> + rtx cmp_res = simplify_relational_operation (code, mode0, mode1,
> + cmps[i0], cmps[i1]);
> + if (i0 >= 2 && i1 >= 2 && (i0 ^ i1) & 1)
> + ASSERT_TRUE (cmp_res == NULL_RTX);
> + else
> + {
> + ASSERT_TRUE (cmp_res != NULL_RTX
> + && (CONSTANT_P (cmp_res)
> + || (COMPARISON_P (cmp_res)
> + && GET_MODE (cmp_res) == mode0
> + && REG_P (XEXP (cmp_res, 0))
> + && REG_P (XEXP (cmp_res, 1)))));
> + for (rtx reg0_val : vals)
> + for (rtx reg1_val : vals)
> + {
> + rtx val0 = simplify_const_relational_operation
> + (codes[i0], mode1, reg0_val, reg1_val);
> + rtx val1 = simplify_const_relational_operation
> + (codes[i1], mode1, reg0_val, reg1_val);
> + rtx val = simplify_const_relational_operation
> + (code, mode0, val0, val1);
> + rtx folded = cmp_res;
> + if (COMPARISON_P (cmp_res))
> + folded = simplify_const_relational_operation
> + (GET_CODE (cmp_res), mode0,
> + XEXP (cmp_res, 0) == reg0 ? reg0_val : reg1_val,
> + XEXP (cmp_res, 1) == reg0 ? reg0_val : reg1_val);
> + ASSERT_RTX_EQ (val, folded);
> + }
> + }
> + }
> +}
> +
>
> /* Verify some simplifications involving scalar expressions. */
>
> @@ -8517,6 +8627,8 @@ test_scalar_ops ()
> test_scalar_int_ext_ops2 (DImode, HImode, QImode);
> test_scalar_int_ext_ops2 (DImode, SImode, QImode);
> test_scalar_int_ext_ops2 (DImode, SImode, HImode);
> +
> + test_comparisons (QImode, HImode, SImode);
> }
>
> /* Test vector simplifications involving VEC_DUPLICATE in which the
> diff --git a/gcc/testsuite/gcc.dg/torture/pr117186.c
> b/gcc/testsuite/gcc.dg/torture/pr117186.c
> new file mode 100644
> index 00000000000..83bcb650102
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr117186.c
> @@ -0,0 +1,15 @@
> +/* { dg-do run } */
> +
> +[[gnu::noipa]] int
> +f1 (int x, int y)
> +{
> + return (x < y) < (y < x);
> +}
> +
> +int
> +main (void)
> +{
> + if (f1 (-1, 0))
> + __builtin_abort ();
> + return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr117186.c
> b/gcc/testsuite/gcc.target/aarch64/pr117186.c
> new file mode 100644
> index 00000000000..d6e2a4af58c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr117186.c
> @@ -0,0 +1,128 @@
> +/* { dg-options "-O" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
> +/*
> +** f1:
> +** (
> +** cmp w0, w1
> +** cset w0, gt
> +** |
> +** cmp w1, w0
> +** cset w0, lt
> +** )
> +** ret
> +*/
> +int
> +f1 (int x, int y)
> +{
> + return (x < y) < (y < x);
> +}
> +
> +/*
> +** f2:
> +** mov w0, 0
> +** ret
> +*/
> +int
> +f2 (int x, int y)
> +{
> + return (x <= y) < (x < y);
> +}
> +
> +/*
> +** f3:
> +** cmp (w0, w1|w1, w0)
> +** cset w0, ne
> +** ret
> +*/
> +int
> +f3 (int x, int y)
> +{
> + return (x <= y) == (x < y);
> +}
> +
> +/*
> +** f4:
> +** cmp (w0, w1|w1, w0)
> +** cset w0, ne
> +** ret
> +*/
> +int
> +f4 (int x, int y)
> +{
> + return (y < x) != (x < y);
> +}
> +
> +/*
> +** f5:
> +** cmp (w0, w1|w1, w0)
> +** cset w0, eq
> +** ret
> +*/
> +int
> +f5 (int x, int y)
> +{
> + return (y >= x) > (y > x);
> +}
> +
> +/*
> +** f6:
> +** (
> +** cmp w0, w1
> +** cset w0, ge
> +** |
> +** cmp w1, w0
> +** cset w0, le
> +** )
> +** ret
> +*/
> +int
> +f6 (int x, int y)
> +{
> + return (x < y) < (y <= x);
> +}
> +
> +/*
> +** f7:
> +** mov w0, 1
> +** ret
> +*/
> +int
> +f7 (int x, int y)
> +{
> + return (x < y) <= (x <= y);
> +}
> +
> +/*
> +** f8:
> +** (
> +** cmp w0, w1
> +** cset w0, hi
> +** |
> +** cmp w1, w0
> +** cset w0, ls
> +** )
> +** ret
> +*/
> +int
> +f8 (unsigned int x, unsigned int y)
> +{
> + return (x < y) < (y < x);
> +}
> +
> +/*
> +** f9:
> +** (
> +** cmp w0, w1
> +** cset w0, (hs|cs)
> +** |
> +** cmp w1, w0
> +** cset w0, (lo|cc)
> +** )
> +** ret
> +*/
> +int
> +f9 (unsigned int x, unsigned int y)
> +{
> + return (x < y) < (y <= x);
> +}
> --
> 2.25.1
>