Many thanks.  Here’s the version that I've committed with a ??? comment as
requested (even a no-op else clause to make the logic easier to understand).

2021-08-24  Roger Sayle  <ro...@nextmovesoftware.com>
            Richard Biener  <rguent...@suse.de>

gcc/ChangeLog
        * config/i386/i386-features.c (compute_convert_gain): Provide
        more accurate values for CONST_INT, when optimizing for size.
        * config/i386/i386.c (COSTS_N_BYTES): Move definition from here...
        * config/i386/i386.h (COSTS_N_BYTES): to here.

Cheers,
Roger
--

-----Original Message-----
From: Richard Biener <richard.guent...@gmail.com> 
Sent: 23 August 2021 14:47
To: Roger Sayle <ro...@nextmovesoftware.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.

On Fri, Aug 20, 2021 at 9:55 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
> Hi Richard,
>
> Benchmarking this patch using CSiBE on x86_64-pc-linux-gnu with -Os -m32 
> saves 2432 bytes.
> Of the 893 tests, 34 have size differences, 30 are improvements, 4 are 
> regressions (of a few bytes).
>
> > Also I'm missing a 'else' - in the default case there's no cost/benefit of 
> > using SSE vs. GPR regs?
> > For SSE it would be a constant pool load.
>
> The code size regression  I primarily wanted to tackle was the zero 
> vs. non-zero case when dealing with immediate operands, which was the 
> piece affected by my and Jakub's xor improvements.
>
> Alas my first attempt to specify a non-zero gain in the default 
> (doesn't fit in SImode) case, increased the code size slightly.  The 
> use of the constant pool complicates things, as the number of times 
> the same value is used becomes an issue.  If the constant being loaded 
> is unique, then clearly the increase in constant pool size should 
> (ideally) be taken into account.  But if the same constant is used 
> multiple times in a chain (or is already in the constant pool), the 
> observed cost is much cheaper.  Empirically, a value of zero isn't a 
> poor choice, so the decision on whether to use vector instructions is shifted 
> to the gains from operations being performed, rather than the loading of 
> integer constants.  No doubt, like rtx_costs, these are free parameters that 
> future generations will continue to tweak and refine.
>
> Given that this patch reduces code size with -Os, both with and without -m32, 
> ok for mainline?

OK if you add a comment for the missing 'else'.

Thanks,
Richard.

> Thanks in advance,
> Roger
> --

diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c
index d9c6652..5a99ea7 100644
--- a/gcc/config/i386/i386-features.c
+++ b/gcc/config/i386/i386-features.c
@@ -610,12 +610,40 @@ general_scalar_chain::compute_convert_gain ()
 
          case CONST_INT:
            if (REG_P (dst))
-             /* DImode can be immediate for TARGET_64BIT and SImode always.  */
-             igain += m * COSTS_N_INSNS (1);
+             {
+               if (optimize_insn_for_size_p ())
+                 {
+                   /* xor (2 bytes) vs. xorps (3 bytes).  */
+                   if (src == const0_rtx)
+                     igain -= COSTS_N_BYTES (1);
+                   /* movdi_internal vs. movv2di_internal.  */
+                   /* => mov (5 bytes) vs. movaps (7 bytes).  */
+                   else if (x86_64_immediate_operand (src, SImode))
+                     igain -= COSTS_N_BYTES (2);
+                   else
+                     /* ??? Larger immediate constants are placed in the
+                        constant pool, where the size benefit/impact of
+                        STV conversion is affected by whether and how
+                        often each constant pool entry is shared/reused.
+                        The value below is empirically derived from the
+                        CSiBE benchmark (and the optimal value may drift
+                        over time).  */
+                     igain += COSTS_N_BYTES (0);
+                 }
+               else
+                 {
+                   /* DImode can be immediate for TARGET_64BIT
+                      and SImode always.  */
+                   igain += m * COSTS_N_INSNS (1);
+                   igain -= vector_const_cost (src);
+                 }
+             }
            else if (MEM_P (dst))
-             igain += (m * ix86_cost->int_store[2]
-                       - ix86_cost->sse_store[sse_cost_idx]);
-           igain -= vector_const_cost (src);
+             {
+               igain += (m * ix86_cost->int_store[2]
+                         - ix86_cost->sse_store[sse_cost_idx]);
+               igain -= vector_const_cost (src);
+             }
            break;
 
          default:
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4d4ab6a..5abf2a6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19982,8 +19982,6 @@ ix86_division_cost (const struct processor_costs *cost,
     return cost->divide[MODE_INDEX (mode)];
 }
 
-#define COSTS_N_BYTES(N) ((N) * 2)
-
 /* Return cost of shift in MODE.
    If CONSTANT_OP1 is true, the op1 value is known and set in OP1_VAL.
    AND_IN_OP1 specify in op1 is result of and and SHIFT_AND_TRUNCATE
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 21fe51b..edbfcaf 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -88,6 +88,11 @@ struct stringop_algs
   } size [MAX_STRINGOP_ALGS];
 };
 
+/* Analog of COSTS_N_INSNS when optimizing for size.  */
+#ifndef COSTS_N_BYTES
+#define COSTS_N_BYTES(N) ((N) * 2)
+#endif
+
 /* Define the specific costs for a given cpu.  NB: hard_register is used
    by TARGET_REGISTER_MOVE_COST and TARGET_MEMORY_MOVE_COST to compute
    hard register move costs by register allocator.  Relative costs of

Reply via email to