Many thanks. Here’s the version that I've committed with a ??? comment as requested (even a no-op else clause to make the logic easier to understand).
2021-08-24 Roger Sayle <ro...@nextmovesoftware.com> Richard Biener <rguent...@suse.de> gcc/ChangeLog * config/i386/i386-features.c (compute_convert_gain): Provide more accurate values for CONST_INT, when optimizing for size. * config/i386/i386.c (COSTS_N_BYTES): Move definition from here... * config/i386/i386.h (COSTS_N_BYTES): to here. Cheers, Roger -- -----Original Message----- From: Richard Biener <richard.guent...@gmail.com> Sent: 23 August 2021 14:47 To: Roger Sayle <ro...@nextmovesoftware.com> Cc: GCC Patches <gcc-patches@gcc.gnu.org> Subject: Re: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass. On Fri, Aug 20, 2021 at 9:55 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > Hi Richard, > > Benchmarking this patch using CSiBE on x86_64-pc-linux-gnu with -Os -m32 > saves 2432 bytes. > Of the 893 tests, 34 have size differences, 30 are improvements, 4 are > regressions (of a few bytes). > > > Also I'm missing a 'else' - in the default case there's no cost/benefit of > > using SSE vs. GPR regs? > > For SSE it would be a constant pool load. > > The code size regression I primarily wanted to tackle was the zero > vs. non-zero case when dealing with immediate operands, which was the > piece affected by my and Jakub's xor improvements. > > Alas my first attempt to specify a non-zero gain in the default > (doesn't fit in SImode) case, increased the code size slightly. The > use of the constant pool complicates things, as the number of times > the same value is used becomes an issue. If the constant being loaded > is unique, then clearly the increase in constant pool size should > (ideally) be taken into account. But if the same constant is used > multiple times in a chain (or is already in the constant pool), the > observed cost is much cheaper. Empirically, a value of zero isn't a > poor choice, so the decision on whether to use vector instructions is shifted > to the gains from operations being performed, rather than the loading of > integer constants. No doubt, like rtx_costs, these are free parameters that > future generations will continue to tweak and refine. > > Given that this patch reduces code size with -Os, both with and without -m32, > ok for mainline? OK if you add a comment for the missing 'else'. Thanks, Richard. > Thanks in advance, > Roger > --
diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c index d9c6652..5a99ea7 100644 --- a/gcc/config/i386/i386-features.c +++ b/gcc/config/i386/i386-features.c @@ -610,12 +610,40 @@ general_scalar_chain::compute_convert_gain () case CONST_INT: if (REG_P (dst)) - /* DImode can be immediate for TARGET_64BIT and SImode always. */ - igain += m * COSTS_N_INSNS (1); + { + if (optimize_insn_for_size_p ()) + { + /* xor (2 bytes) vs. xorps (3 bytes). */ + if (src == const0_rtx) + igain -= COSTS_N_BYTES (1); + /* movdi_internal vs. movv2di_internal. */ + /* => mov (5 bytes) vs. movaps (7 bytes). */ + else if (x86_64_immediate_operand (src, SImode)) + igain -= COSTS_N_BYTES (2); + else + /* ??? Larger immediate constants are placed in the + constant pool, where the size benefit/impact of + STV conversion is affected by whether and how + often each constant pool entry is shared/reused. + The value below is empirically derived from the + CSiBE benchmark (and the optimal value may drift + over time). */ + igain += COSTS_N_BYTES (0); + } + else + { + /* DImode can be immediate for TARGET_64BIT + and SImode always. */ + igain += m * COSTS_N_INSNS (1); + igain -= vector_const_cost (src); + } + } else if (MEM_P (dst)) - igain += (m * ix86_cost->int_store[2] - - ix86_cost->sse_store[sse_cost_idx]); - igain -= vector_const_cost (src); + { + igain += (m * ix86_cost->int_store[2] + - ix86_cost->sse_store[sse_cost_idx]); + igain -= vector_const_cost (src); + } break; default: diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 4d4ab6a..5abf2a6 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -19982,8 +19982,6 @@ ix86_division_cost (const struct processor_costs *cost, return cost->divide[MODE_INDEX (mode)]; } -#define COSTS_N_BYTES(N) ((N) * 2) - /* Return cost of shift in MODE. If CONSTANT_OP1 is true, the op1 value is known and set in OP1_VAL. AND_IN_OP1 specify in op1 is result of and and SHIFT_AND_TRUNCATE diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 21fe51b..edbfcaf 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -88,6 +88,11 @@ struct stringop_algs } size [MAX_STRINGOP_ALGS]; }; +/* Analog of COSTS_N_INSNS when optimizing for size. */ +#ifndef COSTS_N_BYTES +#define COSTS_N_BYTES(N) ((N) * 2) +#endif + /* Define the specific costs for a given cpu. NB: hard_register is used by TARGET_REGISTER_MOVE_COST and TARGET_MEMORY_MOVE_COST to compute hard register move costs by register allocator. Relative costs of