Thanks, this is the patch I'm going to check-in.
For general ccmp scenario, the tree sequence is like
_1 = (a < b)
_2 = (c < d)
_3 = _1 & _2
current ccmp expanding will try to swap compare order for _1 and _2,
compare the expansion cost/cost2 for expanding _1 or _2 first, then
return the sequence with lower cost.
It is possible that one expansion succeeds and the other fails.
For example, x86 has int ccmp but not fp ccmp, so a combined fp and
int comparison must be ordered such that the fp comparison happens
first. The costs are not meaningful for failed expansions.
Check the expand_ccmp_next result ret and ret2, returns the valid one
before cost comparison.
gcc/ChangeLog:
* ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of
expand_ccmp_next, returns the valid one first instead of
comparing cost.
---
gcc/ccmp.cc | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc
index 7cb525addf4..4d50708d986 100644
--- a/gcc/ccmp.cc
+++ b/gcc/ccmp.cc
@@ -247,7 +247,15 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn
**prep_seq, rtx_insn **gen_seq)
cost2 = seq_cost (prep_seq_2, speed_p);
cost2 += seq_cost (gen_seq_2, speed_p);
}
- if (cost2 < cost1)
+
+ /* It's possible that one expansion succeeds and the other
+ fails.
+ For example, x86 has int ccmp but not fp ccmp, and so a
+ combined fp and int comparison must be ordered such that
+ the fp comparison happens first. The costs are not
+ meaningful for failed expansions. */
+
+ if (ret2 && (!ret || cost2 < cost1))
{
*prep_seq = prep_seq_2;
*gen_seq = gen_seq_2;
--
2.31.1
Richard Sandiford <[email protected]> 于2024年6月5日周三 17:21写道:
>
> Hongyu Wang <[email protected]> writes:
> > CC'd Richard for ccmp part as previously it is added only for aarch64.
> > The original logic will not interrupted since if
> > aarch64_gen_ccmp_first succeeded, aarch64_gen_ccmp_next will also
> > success, the cmp/fcmp and ccmp/fccmp supports all GPI/GPF, and the
> > prepare_operand will fixup the input that cmp supports but ccmp not,
> > so ret/ret2 will all be valid when comparing cost.
> > Thanks in advance.
>
> Sorry for the slow review.
>
> > Hongyu Wang <[email protected]> 于2024年5月15日周三 16:22写道:
> >>
> >> For general ccmp scenario, the tree sequence is like
> >>
> >> _1 = (a < b)
> >> _2 = (c < d)
> >> _3 = _1 & _2
> >>
> >> current ccmp expanding will try to swap compare order for _1 and _2,
> >> compare the cost/cost2 between compare _1 and _2 first, then return the
> >> sequence with lower cost.
> >>
> >> For x86 ccmp, we don't support FP compare as ccmp operand, but we
> >> support fp comi + int ccmp sequence. With current cost comparison
> >> model, the fp comi + int ccmp can never be generated since it doesn't
> >> check whether expand_ccmp_next returns available result and the rtl
> >> cost for the empty ccmp sequence is always smaller.
> >>
> >> Check the expand_ccmp_next result ret and ret2, returns the valid one
> >> before cost comparison.
> >>
> >> gcc/ChangeLog:
> >>
> >> * ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of
> >> expand_ccmp_next, returns the valid one first before
> >> comparing cost.
> >> ---
> >> gcc/ccmp.cc | 12 +++++++++++-
> >> 1 file changed, 11 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc
> >> index 7cb525addf4..4b424220068 100644
> >> --- a/gcc/ccmp.cc
> >> +++ b/gcc/ccmp.cc
> >> @@ -247,7 +247,17 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn **prep_seq,
> >> rtx_insn **gen_seq)
> >> cost2 = seq_cost (prep_seq_2, speed_p);
> >> cost2 += seq_cost (gen_seq_2, speed_p);
> >> }
> >> - if (cost2 < cost1)
> >> +
> >> + /* For x86 target the ccmp does not support fp operands, but
> >> + have fcomi insn that can produce eflags and then do int
> >> + ccmp. So if one of the op is fp compare, ret1 or ret2 can
> >> + fail, and the cost of the corresponding empty seq will
> >> + always be smaller, then the NULL sequence will be returned.
> >> + Add check for ret and ret2, returns the available one if
> >> + the other is NULL. */
>
> I think the more fundamental point is that the cost of a failed
> expansion isn't meaningful. So how about:
>
> /* It's possible that one expansion succeeds and the other fails.
> For example, x86 has int ccmp but not fp ccmp, and so a combined
> fp and int comparison must be ordered such that the fp comparison
> happens first. The costs are not meaningful for failed
> expansions. */
>
> >> + if ((!ret && ret2)
> >> + || (!(ret && !ret2)
> >> + && cost2 < cost1))
>
> I think this simplifies to:
>
> if (ret2 && (!ret1 || cost2 < cost1))
>
> OK with those changes, thanks.
>
> Richard
>
> >> {
> >> *prep_seq = prep_seq_2;
> >> *gen_seq = gen_seq_2;
> >> --
> >> 2.31.1
> >>