On Mon, Dec 8, 2025 at 3:59 PM Richard Biener <[email protected]> wrote: > > On Mon, 8 Dec 2025, Uros Bizjak wrote: > > > On Mon, Dec 8, 2025 at 2:41 PM Richard Biener <[email protected]> wrote: > > > > > > The following adjusts costing of vector construction from scalars for > > > FP modes which with 387 math can reside in FP regs which need spilling > > > to be reloaded to XMM. I've played on the safe side with mixed > > > SSE/387 math. > > > > > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. > > > > > > OK? > > > > > > Thanks, > > > Richard. > > > > > > PR target/121230 > > > * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): > > > With FP mode and 387 math cost spill/reload. > > > > > > * gcc.target/i386/pr121230.c: New testcase. > > > --- > > > gcc/config/i386/i386.cc | 15 ++++++++++++++- > > > gcc/testsuite/gcc.target/i386/pr121230.c | 16 ++++++++++++++++ > > > 2 files changed, 30 insertions(+), 1 deletion(-) > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr121230.c > > > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > > index db43045753b..ad978d7474d 100644 > > > --- a/gcc/config/i386/i386.cc > > > +++ b/gcc/config/i386/i386.cc > > > @@ -26397,7 +26397,20 @@ ix86_vector_costs::add_stmt_cost (int count, > > > vect_cost_for_stmt kind, > > > (TREE_OPERAND (gimple_assign_rhs1 (def), > > > 0)))))) > > > { > > > if (fp) > > > - m_num_sse_needed[where]++; > > > + { > > > + /* Scalar FP values residing in x87 registers need to be > > > + spilled and reloaded. */ > > > + if (ix86_fpmath & FPMATH_387) > > > > Perhaps you can use the IS_STACK_MODE() macro, it determines more > > precisely which mode is handled in stack registers. > > Sure (though practically vectorized are only SFmode and DFmode?).
Please note that the macro also handles support for SFmode and DFmode with TARGET_SSE / TARGET_SSE2. > Like the following. Yes. > Re-testing on x86_64-unknown-linux-gnu, OK? OK. Thanks, Uros. > > Thanks, > Richard. > > From 69c63b06daf193fcc5fa2e0093db0d4198b75432 Mon Sep 17 00:00:00 2001 > From: Richard Biener <[email protected]> > Date: Mon, 8 Dec 2025 14:36:58 +0100 > Subject: [PATCH] target/121230 - x86 vector CTOR cost with 387 math > To: [email protected] > > The following adjusts costing of vector construction from scalars for > FP modes which with 387 math can reside in FP regs which need spilling > to be reloaded to XMM. I've played on the safe side with mixed > SSE/387 math. > > PR target/121230 > * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): > With FP mode and 387 math cost spill/reload. > > * gcc.target/i386/pr121230.c: New testcase. > --- > gcc/config/i386/i386.cc | 15 ++++++++++++++- > gcc/testsuite/gcc.target/i386/pr121230.c | 16 ++++++++++++++++ > 2 files changed, 30 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr121230.c > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index db43045753b..75a9cb6211a 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -26397,7 +26397,20 @@ ix86_vector_costs::add_stmt_cost (int count, > vect_cost_for_stmt kind, > (TREE_OPERAND (gimple_assign_rhs1 (def), > 0)))))) > { > if (fp) > - m_num_sse_needed[where]++; > + { > + /* Scalar FP values residing in x87 registers need to be > + spilled and reloaded. */ > + auto mode2 = TYPE_MODE (TREE_TYPE (op)); > + if (IS_STACK_MODE (mode2)) > + { > + int cost > + = (ix86_cost->hard_register.fp_store[mode2 == SFmode > + ? 0 : 1] > + + ix86_cost->sse_load[sse_store_index (mode2)]); > + stmt_cost += COSTS_N_INSNS (cost) / 2; > + } > + m_num_sse_needed[where]++; > + } > else > { > m_num_gpr_needed[where]++; > diff --git a/gcc/testsuite/gcc.target/i386/pr121230.c > b/gcc/testsuite/gcc.target/i386/pr121230.c > new file mode 100644 > index 00000000000..67c9c5ccb2d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr121230.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile { target ia32 } } */ > +/* { dg-options "-O3 -march=athlon-xp -mfpmath=387 > -fexcess-precision=standard" } */ > + > +typedef struct { > + float a; > + float b; > +} f32_2; > + > +f32_2 add32_2(f32_2 x, f32_2 y) { > + return (f32_2){ x.a + y.a, x.b + y.b}; > +} > + > +/* We do not want the vectorizer to vectorize the store and/or the > + conversion (with IA32 we do not support V2SF add) given that spills > + FP regs to reload them to XMM. */ > +/* { dg-final { scan-assembler-not "movss\[ \\t\]+\[0-9\]*\\\(%esp\\\), > %xmm" } } */ > -- > 2.51.0
