Hi Jakub,
I've been bootstrapping and regression testing a variant of Jakub's
previously proposed change:

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index f38e3db41fa..123948a2dea 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -21883,7 +21883,10 @@ ix86_rtx_costs (rtx x, machine_mode mode, int
outer_cod
e_i, int opno,
     case SYMBOL_REF:
       if (x86_64_immediate_operand (x, VOIDmode))
        *total = 0;
-     else
+      else if (x86_64_zext_immediate_operand (x, VOIDmode))
+       /* movl is as expensive as a simple instruction.  */
+       *total = COSTS_N_INSNS (1);
+      else
        /* movabsq is slightly more expensive than a simple instruction. */
        *total = COSTS_N_INSNS (1) + 1;
       return true;

Which more accurately models the costs (matching the three types of operands
he describes).
The plan was to then investigate the RTL expansion, to improve shift
generation, so that
expand sees the "actual" instruction count, and make a more informed
decision on which 
division implementation to use.


p.s. It's odd this is a P1.  It's not wrong code, but a single-cycle
instruction timing issue
(where the current trunk implementation is typically more correct than GCC
14's).

Thoughts?


> -----Original Message-----
> From: Jakub Jelinek <ja...@redhat.com>
> Sent: 02 April 2025 12:30
> To: Richard Biener <rguent...@suse.de>; Jan Hubicka <j...@suse.cz>; Uros
Bizjak
> <ubiz...@gmail.com>; Roger Sayle <ro...@nextmovesoftware.com>; Richard
> Sandiford <richard.sandif...@arm.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: [PATCH] rtlanal, i386: Adjust pattern_cost and x86 constant cost
> [PR115910]
> 
> Hi!
> 
> Below is an attempt to fix up RTX costing P1 caused by r15-775
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/thread.html#652446
> @@ -21562,7 +21562,8 @@ ix86_rtx_costs (rtx x, machine_mode mode, int
> outer_code_i, int opno,
>        if (x86_64_immediate_operand (x, VOIDmode))
>       *total = 0;
>       else
> -     *total = 1;
> +     /* movabsq is slightly more expensive than a simple instruction. */
> +     *total = COSTS_N_INSNS (1) + 1;
>        return true;
> 
>      case CONST_DOUBLE:
> change.  In my understanding this was partially trying to workaround weird
code
> in pattern_cost, which uses
>   return cost > 0 ? cost : COSTS_N_INSNS (1); That doesn't make sense to
me.  All
> costs smaller than COSTS_N_INSNS (1) mean we need to have at least one
> instruction there which has the COSTS_N_INSNS (1) minimal cost.  So
special
> casing just cost 0 for the really cheap immediates which can be used
pretty much
> everywhere but not ones which have just tiny bit larger cost than that (1,
2 or 3) is
> just weird.
> 
> So, the following patch changes that to MAX (COSTS_N_INSNS (1), cost)
which
> doesn't have this weird behavior where set_src_cost 0 is considered more
> expensive than set_src_cost 1.
> 
> Note, pattern_cost isn't the only spot where costs are computed and
normally we
> often sum the subcosts of different parts of a pattern or just query rtx
costs of
> different parts of subexpressions, so the jump from
> 1 to 5 is quite significant.
> 
> Additionally, x86_64 doesn't have just 2 kinds of constants with different
costs, it
> has 3, signed 32-bit ones are the ones which can appear in almost all
instructions
> and so using cost of 0 for those looks best, then unsigned 32-bit ones
which can be
> done with still cheap movl instruction (and I think some others too) and
finally full
> 64-bit ones which can be done only with a single movabsq instruction and
are
> quite costly both in instruction size and even more expensive to execute.
> 
> The following patch attempts to restore the behavior of GCC 14 with the
> pattern_cost hunk fixed for the unsigned 32-bit ones and only keeps the
bigger
> cost for the 64-bit ones.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, bootstraps/regtests
in
> progress on aarch64-linux, powerpc64le-linux and s390x-linux.
> 
> I don't have a setup to test this on SPEC though.
> 
> Your thoughts on this?
> 
> 2025-04-02  Jakub Jelinek  <ja...@redhat.com>
> 
>       PR target/115910
>       * rtlanal.cc (pattern_cost): Return at least COSTS_N_INSNS (1)
>       rather than just COSTS_N_INTNS (1) for cost <= 0.
>       * config/i386/i386.cc (ix86_rtx_costs): Set *total to 1 for
>       TARGET_64BIT x86_64_zext_immediate_operand constants.
> 
>       * gcc.target/i386/pr115910.c: New test.
> 
> --- gcc/rtlanal.cc.jj 2025-03-06 11:08:20.729230232 +0100
> +++ gcc/rtlanal.cc    2025-04-02 12:08:33.327409772 +0200
> @@ -5772,7 +5772,7 @@ pattern_cost (rtx pat, bool speed)
>      return 0;
> 
>    cost = set_src_cost (SET_SRC (set), GET_MODE (SET_DEST (set)), speed);
> -  return cost > 0 ? cost : COSTS_N_INSNS (1);
> +  return MAX (COSTS_N_INSNS (1), cost);
>  }
> 
>  /* Calculate the cost of a single instruction.  A return value of zero
> --- gcc/config/i386/i386.cc.jj        2025-03-27 23:35:17.798315113 +0100
> +++ gcc/config/i386/i386.cc   2025-04-02 12:14:25.522539997 +0200
> @@ -21883,7 +21883,11 @@ ix86_rtx_costs (rtx x, machine_mode mode
>      case SYMBOL_REF:
>        if (x86_64_immediate_operand (x, VOIDmode))
>       *total = 0;
> -     else
> +      else if (TARGET_64BIT && x86_64_zext_immediate_operand (x,
VOIDmode))
> +     /* Consider the zext constants slightly more expensive, as they
> +        can't appear in most instructions.  */
> +     *total = 1;
> +      else
>       /* movabsq is slightly more expensive than a simple instruction. */
>       *total = COSTS_N_INSNS (1) + 1;
>        return true;
> --- gcc/testsuite/gcc.target/i386/pr115910.c.jj       2025-04-02
> 12:27:36.199606571 +0200
> +++ gcc/testsuite/gcc.target/i386/pr115910.c  2025-04-02
> 12:34:12.025132993 +0200
> @@ -0,0 +1,20 @@
> +/* PR target/115910 */
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -march=x86-64 -mtune=generic -masm=att" } */
> +/* { dg-final { scan-assembler-times {\timulq\t} 2 } } */
> +/* { dg-final { scan-assembler-times {\tshrq\t\$33,} 2 } } */
> +/* { dg-final { scan-assembler-not {\tsarl\t} } } */
> +
> +int
> +foo (int x)
> +{
> +  if (x < 0)
> +    __builtin_unreachable ();
> +  return x / 3U;
> +}
> +
> +int
> +bar (int x)
> +{
> +  return x / 3U;
> +}
> 
>       Jakub


Reply via email to