> On Thu, 17 Apr 2025, Jan Hubicka wrote:
> 
> > Hi,
> > Znver5 has addss cost of 2 while other common floating point SSE operations
> > costs 3 cycles.  We currently have only one entry in the costs tables which
> > makes it impossible to model this.  This patch adds sse_fp_op which is used 
> > for
> > other common FP operations (basically conversions) and updates code 
> > computing
> > costs.
> > 
> > The logic is that typical integer SSE operation (say addition) is 1 cycle 
> > and that
> > correspond to sse_op. "Typical" SSE FP operation i.e. one we do not have 
> > separate
> > cost entry for (i.e. cvtss2sd) is 3 cycles.
> 
> What other "typical" SSE FP ops do we have?  The places you hit also

>From common operations, for zens instructions that take 3 cycles and
could fall into this are essnetially cvtsd2ss, roundss, vrndscaless and
variants.  The latency table is not complete and I think there are more
(mostly accessible by builtins)

> happily trigger for say logical ops on SSE FP modes but those should
> use sse_op, they are as cheap as integer ones?

Yep, I we should cost logicals as sse_op. 
RTL costs does that (it also ccounts NEG/FABS which we do using xorps) and I
did not know we do logicals in FP mode in gimple.  How to trigger them?
> 
> > Looking across the costing code, there are few things that I think makes 
> > sense
> > to work on incrementally.
> >  - add_stmt_cost acconts max/min as sse_op (1) while it is 2 for FP.  This 
> > will
> >    need extra entry
> >  - add_stmt_cost does not seem to special case sqrt, FP->FP conversions
> >    and int<->fp conversions that are all bit different.
> >  - There is also problem in a way how constructors are modeled, since 
> > integer->sse
> >    move is accounted as addss (now fp_op) while it probably should be 
> > derived from
> >    integer_to_sse cost (on Zen, it is more expensive than usual FP operation
> >    and we already have cost entry for it)
> >    Again I guess a problem here is that we do not really know what we are 
> > constructing
> >    and if the specific field of vector is, say, constant or something 
> > doable in SSE
> >    register, we won't need to pay cost for inter-unit move.
> 
> The add_stmt_cost for the SLP case tries to reverse engineer this a bit.
> 
> I'm not sure it makes sense to add a SSE FP op fallback - we should use
> more specific costs for special operations like min/max or sqrt.  Likewise
> bitwise ops should probably ignore that they might operate on FP modes
> (similar for shuffles).

We already have sqrt. I plan to add min/max incrementally.
> 
> For some of the current addss uses using sse_op might make more sense
> (like for the case of vec_construct).

I think in construct we need to account sse->int move here. Using sse_op
would make it look even cheaper than currently and the PR is about
vectorizing integer stores from 4x64bit in integer regs to 1x256bit
which require int->sse movs.

I guess we can go without kitchen sink value for things we do not want
to special case.

Current uses are essentially conversions, where I can add explicit
sse_fpcvt cost, the ix86_builtin_vectorization kitchen sink for various
random stuff which I can ignore and the constructors which we can
incrementally try to cost using int->sse instructions and sse_op. Indeed
no usual FP operation is included there....

Honza
> 
> > Bootstrapped/regtested x86_64-linux. Richi, I wonder if this makes sense to 
> > you?
> > I know you plan to change vectorizer cost code this stage1...
> 
> Yes.
> 
> > gcc/ChangeLog:
> > 
> >     PR target/119298
> >     * config/i386/i386.cc (ix86_rtx_costs): Use sse_fp_op.
> >     (ix86_builtin_vectorization_cost): Use sse_fp_op.
> >     * config/i386/i386.h (struct processor_costs): Add sse_fp_op.
> >     * config/i386/x86-tune-costs.h (struct processor_costs): Update
> >     constructors
> > 
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index b172f716c68..3e8106bdd31 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -22482,14 +22482,14 @@ ix86_rtx_costs (rtx x, machine_mode mode, int 
> > outer_code_i, int opno,
> >        if (!SSE_FLOAT_MODE_SSEMATH_OR_HFBF_P (mode))
> >     *total = 0;
> >        else
> > -        *total = ix86_vec_cost (mode, cost->addss);
> > +   *total = ix86_vec_cost (mode, cost->sse_fp_op);
> >        return false;
> >  
> >      case FLOAT_TRUNCATE:
> >        if (!SSE_FLOAT_MODE_SSEMATH_OR_HFBF_P (mode))
> >     *total = cost->fadd;
> >        else
> > -        *total = ix86_vec_cost (mode, cost->addss);
> > +   *total = ix86_vec_cost (mode, cost->sse_fp_op);
> >        return false;
> >  
> >      case ABS:
> > @@ -24675,7 +24675,7 @@ ix86_builtin_vectorization_cost (enum 
> > vect_cost_for_stmt type_of_cost,
> >    switch (type_of_cost)
> >      {
> >        case scalar_stmt:
> > -        return fp ? ix86_cost->addss : COSTS_N_INSNS (1);
> > +   return fp ? ix86_cost->sse_fp_op : COSTS_N_INSNS (1);
> >  
> >        case scalar_load:
> >     /* load/store costs are relative to register move which is 2. Recompute
> > @@ -24689,7 +24689,7 @@ ix86_builtin_vectorization_cost (enum 
> > vect_cost_for_stmt type_of_cost,
> >  
> >        case vector_stmt:
> >          return ix86_vec_cost (mode,
> > -                         fp ? ix86_cost->addss : ix86_cost->sse_op);
> > +                         fp ? ix86_cost->sse_fp_op : ix86_cost->sse_op);
> >  
> >        case vector_load:
> >     index = sse_store_index (mode);
> > @@ -24759,12 +24759,12 @@ ix86_builtin_vectorization_cost (enum 
> > vect_cost_for_stmt type_of_cost,
> >       /* One vinserti128 for combining two SSE vectors for AVX256.  */
> >       else if (GET_MODE_BITSIZE (mode) == 256)
> >         return ((n - 2) * ix86_cost->sse_op
> > -               + ix86_vec_cost (mode, ix86_cost->addss));
> > +               + ix86_vec_cost (mode, ix86_cost->sse_fp_op));
> >       /* One vinserti64x4 and two vinserti128 for combining SSE
> >          and AVX256 vectors to AVX512.  */
> >       else if (GET_MODE_BITSIZE (mode) == 512)
> >         return ((n - 4) * ix86_cost->sse_op
> > -               + 3 * ix86_vec_cost (mode, ix86_cost->addss));
> > +               + 3 * ix86_vec_cost (mode, ix86_cost->sse_fp_op));
> >       gcc_unreachable ();
> >     }
> >  
> > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> > index 8507243d726..bb3620731ec 100644
> > --- a/gcc/config/i386/i386.h
> > +++ b/gcc/config/i386/i386.h
> > @@ -198,6 +198,8 @@ struct processor_costs {
> >                             /* Specify what algorithm
> >                                to use for stringops on unknown size.  */
> >    const int sse_op;                /* cost of cheap SSE instruction.  */
> > +  const int sse_fp_op;             /* cost of typical SSE FP instruction 
> > not
> > +                              listed below (such as conversion).  */
> >    const int addss;         /* cost of ADDSS/SD SUBSS/SD instructions.  */
> >    const int mulss;         /* cost of MULSS instructions.  */
> >    const int mulsd;         /* cost of MULSD instructions.  */
> > diff --git a/gcc/config/i386/x86-tune-costs.h 
> > b/gcc/config/i386/x86-tune-costs.h
> > index 9477345bdd7..d7f0f19ec55 100644
> > --- a/gcc/config/i386/x86-tune-costs.h
> > +++ b/gcc/config/i386/x86-tune-costs.h
> > @@ -122,6 +122,7 @@ struct processor_costs ix86_size_cost = {/* costs for 
> > tuning for size */
> >    COSTS_N_BYTES (2),                       /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_BYTES (2),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_BYTES (2),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_BYTES (2),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_BYTES (2),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_BYTES (2),                       /* cost of MULSD instruction.  
> > */
> > @@ -234,6 +235,7 @@ struct processor_costs i386_cost = {    /* 386 specific 
> > costs */
> >    COSTS_N_INSNS (122),                     /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (23),                      /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (23),                      /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (27),                      /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (27),                      /* cost of MULSD instruction.  
> > */
> > @@ -347,6 +349,7 @@ struct processor_costs i486_cost = {    /* 486 specific 
> > costs */
> >    COSTS_N_INSNS (83),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (8),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (8),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (16),                      /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (16),                      /* cost of MULSD instruction.  
> > */
> > @@ -458,6 +461,7 @@ struct processor_costs pentium_cost = {
> >    COSTS_N_INSNS (70),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (3),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (3),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (3),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (3),                       /* cost of MULSD instruction.  
> > */
> > @@ -562,6 +566,7 @@ struct processor_costs lakemont_cost = {
> >    COSTS_N_INSNS (70),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (5),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (5),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (5),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (5),                       /* cost of MULSD instruction.  
> > */
> > @@ -681,6 +686,7 @@ struct processor_costs pentiumpro_cost = {
> >    COSTS_N_INSNS (56),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (3),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (3),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (4),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (4),                       /* cost of MULSD instruction.  
> > */
> > @@ -791,6 +797,7 @@ struct processor_costs geode_cost = {
> >    COSTS_N_INSNS (54),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (6),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (6),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (11),                      /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (11),                      /* cost of MULSD instruction.  
> > */
> > @@ -904,6 +911,7 @@ struct processor_costs k6_cost = {
> >    COSTS_N_INSNS (56),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (2),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (2),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (2),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (2),                       /* cost of MULSD instruction.  
> > */
> > @@ -1017,6 +1025,7 @@ struct processor_costs athlon_cost = {
> >    COSTS_N_INSNS (35),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (2),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (4),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (4),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (4),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (4),                       /* cost of MULSD instruction.  
> > */
> > @@ -1140,6 +1149,7 @@ struct processor_costs k8_cost = {
> >    COSTS_N_INSNS (35),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (2),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (4),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (4),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (4),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (4),                       /* cost of MULSD instruction.  
> > */
> > @@ -1271,6 +1281,7 @@ struct processor_costs amdfam10_cost = {
> >    COSTS_N_INSNS (35),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (2),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (4),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (4),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (4),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (4),                       /* cost of MULSD instruction.  
> > */
> > @@ -1394,6 +1405,7 @@ const struct processor_costs bdver_cost = {
> >    COSTS_N_INSNS (52),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (2),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (6),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (6),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (6),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (6),                       /* cost of MULSD instruction.  
> > */
> > @@ -1543,6 +1555,7 @@ struct processor_costs znver1_cost = {
> >    COSTS_N_INSNS (10),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (3),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (3),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (3),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (4),                       /* cost of MULSD instruction.  
> > */
> > @@ -1702,6 +1715,7 @@ struct processor_costs znver2_cost = {
> >    COSTS_N_INSNS (10),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (3),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (3),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (3),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (3),                       /* cost of MULSD instruction.  
> > */
> > @@ -1837,6 +1851,7 @@ struct processor_costs znver3_cost = {
> >    COSTS_N_INSNS (10),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (3),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (3),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (3),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (3),                       /* cost of MULSD instruction.  
> > */
> > @@ -1974,6 +1989,7 @@ struct processor_costs znver4_cost = {
> >    COSTS_N_INSNS (25),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (3),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (3),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (3),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (3),                       /* cost of MULSD instruction.  
> > */
> > @@ -2039,7 +2055,7 @@ struct processor_costs znver5_cost = {
> >                                        in 32,64,128,256 and 512-bit.  */
> >    {8, 8, 8, 12, 12},                       /* cost of storing SSE registers
> >                                        in 32,64,128,256 and 512-bit.  */
> > -  6, 8,                                    /* SSE->integer and integer->SSE
> > +  7, 9,                                    /* SSE->integer and integer->SSE
> >                                        moves.  */
> >    8, 8,                                    /* mask->integer and 
> > integer->mask moves */
> >    {6, 6, 6},                               /* cost of loading mask register
> > @@ -2118,9 +2134,10 @@ struct processor_costs znver5_cost = {
> >  
> >    /* SSE instructions have typical throughput 4 and latency 1.  */
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (3),                       /* cost of SSE FP instruction.  
> > */
> >    /* ADDSS has throughput 2 and latency 2
> >       (in some cases when source is another addition).  */
> > -  COSTS_N_INSNS (3),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> > +  COSTS_N_INSNS (2),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    /* MULSS has throughput 2 and latency 3.  */
> >    COSTS_N_INSNS (3),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (3),                       /* cost of MULSD instruction.  
> > */
> > @@ -2265,6 +2282,7 @@ struct processor_costs skylake_cost = {
> >    COSTS_N_INSNS (20),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (4),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (4),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (4),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (4),                       /* cost of MULSD instruction.  
> > */
> > @@ -2394,6 +2412,7 @@ struct processor_costs icelake_cost = {
> >    COSTS_N_INSNS (20),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (4),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (4),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (4),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (4),                       /* cost of MULSD instruction.  
> > */
> > @@ -2517,6 +2536,7 @@ struct processor_costs alderlake_cost = {
> >    COSTS_N_INSNS (14),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (3),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (3),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (4),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (5),                       /* cost of MULSD instruction.  
> > */
> > @@ -2633,6 +2653,7 @@ const struct processor_costs btver1_cost = {
> >    COSTS_N_INSNS (35),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (3),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (3),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (2),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (4),                       /* cost of MULSD instruction.  
> > */
> > @@ -2746,6 +2767,7 @@ const struct processor_costs btver2_cost = {
> >    COSTS_N_INSNS (35),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (3),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (3),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (2),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (4),                       /* cost of MULSD instruction.  
> > */
> > @@ -2858,6 +2880,7 @@ struct processor_costs pentium4_cost = {
> >    COSTS_N_INSNS (43),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (2),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (4),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (4),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (6),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (6),                       /* cost of MULSD instruction.  
> > */
> > @@ -2973,6 +2996,7 @@ struct processor_costs nocona_cost = {
> >    COSTS_N_INSNS (44),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (2),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (5),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (5),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (7),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (7),                       /* cost of MULSD instruction.  
> > */
> > @@ -3086,6 +3110,7 @@ struct processor_costs atom_cost = {
> >    COSTS_N_INSNS (40),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (5),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (5),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (4),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (5),                       /* cost of MULSD instruction.  
> > */
> > @@ -3199,6 +3224,7 @@ struct processor_costs slm_cost = {
> >    COSTS_N_INSNS (40),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (3),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (3),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (4),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (5),                       /* cost of MULSD instruction.  
> > */
> > @@ -3326,6 +3352,7 @@ struct processor_costs tremont_cost = {
> >    COSTS_N_INSNS (14),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (3),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (3),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (4),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (5),                       /* cost of MULSD instruction.  
> > */
> > @@ -3439,6 +3466,7 @@ struct processor_costs intel_cost = {
> >    COSTS_N_INSNS (40),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (8),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (8),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (8),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (8),                       /* cost of MULSD instruction.  
> > */
> > @@ -3557,6 +3585,7 @@ struct processor_costs lujiazui_cost = {
> >    COSTS_N_INSNS (44),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (3),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (3),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (3),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (4),                       /* cost of MULSD instruction.  
> > */
> > @@ -3673,6 +3702,7 @@ struct processor_costs yongfeng_cost = {
> >    COSTS_N_INSNS (40),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (3),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (3),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (3),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (3),                       /* cost of MULSD instruction.  
> > */
> > @@ -3789,6 +3819,7 @@ struct processor_costs shijidadao_cost = {
> >    COSTS_N_INSNS (44),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (3),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (3),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (3),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (3),                       /* cost of MULSD instruction.  
> > */
> > @@ -3913,6 +3944,7 @@ struct processor_costs generic_cost = {
> >    COSTS_N_INSNS (14),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (3),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (3),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (4),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (5),                       /* cost of MULSD instruction.  
> > */
> > @@ -4042,6 +4074,7 @@ struct processor_costs core_cost = {
> >    COSTS_N_INSNS (23),                      /* cost of FSQRT instruction.  
> > */
> >  
> >    COSTS_N_INSNS (1),                       /* cost of cheap SSE 
> > instruction.  */
> > +  COSTS_N_INSNS (3),                       /* cost of SSE FP instruction.  
> > */
> >    COSTS_N_INSNS (3),                       /* cost of ADDSS/SD SUBSS/SD 
> > insns.  */
> >    COSTS_N_INSNS (4),                       /* cost of MULSS instruction.  
> > */
> >    COSTS_N_INSNS (5),                       /* cost of MULSD instruction.  
> > */
> > 
> 
> -- 
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to