https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298

--- Comment #7 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Hmm, the sequence does not use + at all, but I think I know what is going on.
While the field is called addss it is used as an kitchen sink for all other
simple operations.  
      /* pmuludq under sse2, pmuldq under sse4.1, for sign_extend,
         require extra 4 mul, 4 add, 4 cmp and 2 shift.  */
      if (!TARGET_SSE4_1 && !uns_p)
        extra_cost = (cost->mulss + cost->addss + cost->sse_op) * 4
                      + cost->sse_op * 2;
....
    case FLOAT_EXTEND:
      if (!SSE_FLOAT_MODE_SSEMATH_OR_HFBF_P (mode))
        *total = 0;
      else
        *total = ix86_vec_cost (mode, cost->addss);
      return false;
....
    case FLOAT_TRUNCATE:
      if (!SSE_FLOAT_MODE_SSEMATH_OR_HFBF_P (mode)) 
        *total = cost->fadd;
      else
        *total = ix86_vec_cost (mode, cost->addss);
      return false;
...
      case scalar_stmt:
        return fp ? ix86_cost->addss : COSTS_N_INSNS (1);
....
      case vector_stmt:
        return ix86_vec_cost (mode,
                              fp ? ix86_cost->addss : ix86_cost->sse_op);

Only addss was sped up (and apparently only in common contextes), other simple
SSE operations are still 3 cycles.

We have
  const int sse_op;             /* cost of cheap SSE instruction.  */
  const int addss;              /* cost of ADDSS/SD SUBSS/SD instructions.  */

SSE_OP is used for integer SSE instructions, which are typically 1 cycle, so
perhaps we want to have also sse_fp_op /* Chose of cheap SSE fp instruction. 
*/
in addition to addss.

But to be precise builtin_vectorizer cost would need to now if
scalar/vector_stmt is additio or something else, which AFAK it doesn't

Reply via email to