On Fri, Oct 21, 2022 at 9:15 AM Jakub Jelinek <ja...@redhat.com> wrote:
>
> Hi!
>
> As the testcase shows, when cbranchbf4/cstorebf4 patterns are defined,
> we can get ICEs for conditional moves.
> The problem is that the generic conditional move expansion just calls
> prepare_cmp_insn which just checks that such a cbranch<mode>4 exists
> and returns directly such comparison and passes it down to the conditional
> move optabs.
> The following patch fixes it by punting if the comparisons aren't
> ix86_fp_comparison_operator (to tell the generic code it should separately
> compare) and to handle the promotion of BFmode comparison operands to
> SFmode such that comparison is performed in SFmode.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2022-10-21  Jakub Jelinek  <ja...@redhat.com>
>
>         PR target/107322
>         * config/i386/i386-expand.cc (ix86_prepare_fp_compare_args): For
>         BFmode comparisons promote arguments to SFmode and recurse.
>         (ix86_expand_int_movcc, ix86_expand_fp_movcc): Return false early
>         if comparison operands are BFmode and operands[1] is not
>         ix86_fp_comparison_operator.
>
>         * gcc.target/i386/pr107322.c: New test.

OK, but now we have two more copies of a function that effectively
extends BF to SF. Can you please split this utility function out and
use it here and in cbranchbf4/cstorebf4? I'm talking about this part:

+      op = gen_lowpart (HImode, op1);
+      if (CONST_INT_P (op))
+       op = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
+                                            op1, BFmode);
+      else
+       {
+         rtx t1 = gen_reg_rtx (SImode);
+         emit_insn (gen_zero_extendhisi2 (t1, op));
+         emit_insn (gen_ashlsi3 (t1, t1, GEN_INT (16)));
+         op = gen_lowpart (SFmode, t1);
+       }

Taking this a bit further, it looks like a generic function to extend
BF to SF, when extendbfsf2 named function is not defined.

The above could be a follow-up patch, the proposed patch is OK.

On a related note, I still think that without corresponding BFmode
expanders, generic middle-end code should extend BFmode to SFmode and
perform all comparisons in SFmode, in effect what cbranchbf4/cstorebf4
x86 expanders are doing now by themselves. This would allow
cbranchbf4/cstorebf4 to fail (or to not be present), and still result
in optimal code without intermediate extends and truncations.

Thanks,
Uros.

> --- gcc/config/i386/i386-expand.cc.jj   2022-10-19 11:20:54.602879162 +0200
> +++ gcc/config/i386/i386-expand.cc      2022-10-20 12:15:37.750758679 +0200
> @@ -2626,6 +2626,35 @@ ix86_prepare_fp_compare_args (enum rtx_c
>    machine_mode op_mode = GET_MODE (op0);
>    bool is_sse = SSE_FLOAT_MODE_SSEMATH_OR_HF_P (op_mode);
>
> +  if (op_mode == BFmode)
> +    {
> +      rtx op = gen_lowpart (HImode, op0);
> +      if (CONST_INT_P (op))
> +       op = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
> +                                            op0, BFmode);
> +      else
> +       {
> +         rtx t1 = gen_reg_rtx (SImode);
> +         emit_insn (gen_zero_extendhisi2 (t1, op));
> +         emit_insn (gen_ashlsi3 (t1, t1, GEN_INT (16)));
> +         op = gen_lowpart (SFmode, t1);
> +       }
> +      *pop0 = op;
> +      op = gen_lowpart (HImode, op1);
> +      if (CONST_INT_P (op))
> +       op = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
> +                                            op1, BFmode);
> +      else
> +       {
> +         rtx t1 = gen_reg_rtx (SImode);
> +         emit_insn (gen_zero_extendhisi2 (t1, op));
> +         emit_insn (gen_ashlsi3 (t1, t1, GEN_INT (16)));
> +         op = gen_lowpart (SFmode, t1);
> +       }
> +      *pop1 = op;
> +      return ix86_prepare_fp_compare_args (code, pop0, pop1);
> +    }
> +
>    /* All of the unordered compare instructions only work on registers.
>       The same is true of the fcomi compare instructions.  The XFmode
>       compare instructions require registers except when comparing
> @@ -3164,6 +3193,10 @@ ix86_expand_int_movcc (rtx operands[])
>           && !TARGET_64BIT))
>      return false;
>
> +  if (GET_MODE (op0) == BFmode
> +      && !ix86_fp_comparison_operator (operands[1], VOIDmode))
> +    return false;
> +
>    start_sequence ();
>    compare_op = ix86_expand_compare (code, op0, op1);
>    compare_seq = get_insns ();
> @@ -4238,6 +4271,10 @@ ix86_expand_fp_movcc (rtx operands[])
>    rtx op0 = XEXP (operands[1], 0);
>    rtx op1 = XEXP (operands[1], 1);
>
> +  if (GET_MODE (op0) == BFmode
> +      && !ix86_fp_comparison_operator (operands[1], VOIDmode))
> +    return false;
> +
>    if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
>      {
>        machine_mode cmode;
> --- gcc/testsuite/gcc.target/i386/pr107322.c.jj 2022-10-20 12:28:46.829983399 
> +0200
> +++ gcc/testsuite/gcc.target/i386/pr107322.c    2022-10-20 12:29:44.287201650 
> +0200
> @@ -0,0 +1,33 @@
> +/* PR target/107322 */
> +/* { dg-do compile } */
> +/* { dg-options "-fexcess-precision=16 -O -msse2 -mfpmath=sse" } */
> +
> +int i, j;
> +float k, l;
> +__bf16 f;
> +
> +void
> +foo (void)
> +{
> +  i *= 0 >= f;
> +}
> +
> +void
> +bar (void)
> +{
> +  i *= 0 <= f;
> +}
> +
> +void
> +baz (int x, int y)
> +{
> +  i = 0 >= f ? x : y;
> +  j = 0 <= f ? x + 2 : y + 3;
> +}
> +
> +void
> +qux (float x, float y)
> +{
> +  k = 0 >= f ? x : y;
> +  l = 0 <= f ? x + 2 : y + 3;
> +}
>
>         Jakub
>

Reply via email to