[Bug target/119298] [15 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f

rguenth at gcc dot gnu.org via Gcc-bugs Wed, 09 Apr 2025 04:34:17 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298


--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #7)
> Hmm, the sequence does not use + at all, but I think I know what is going
> on. While the field is called addss it is used as an kitchen sink for all
> other simple operations.  
>       /* pmuludq under sse2, pmuldq under sse4.1, for sign_extend,
>          require extra 4 mul, 4 add, 4 cmp and 2 shift.  */
>       if (!TARGET_SSE4_1 && !uns_p)
>         extra_cost = (cost->mulss + cost->addss + cost->sse_op) * 4
>                       + cost->sse_op * 2;

this looks OK?

> ....
>     case FLOAT_EXTEND:
>       if (!SSE_FLOAT_MODE_SSEMATH_OR_HFBF_P (mode))
>         *total = 0;
>       else
>         *total = ix86_vec_cost (mode, cost->addss);

not sure why we use addss instead of sse_op here?

>       return false;
> ....
>     case FLOAT_TRUNCATE:
>       if (!SSE_FLOAT_MODE_SSEMATH_OR_HFBF_P (mode)) 
>         *total = cost->fadd;
>       else
>         *total = ix86_vec_cost (mode, cost->addss);

likewise?  but this is rtx_cost

>       return false;
> ...
>       case scalar_stmt:
>         return fp ? ix86_cost->addss : COSTS_N_INSNS (1);

we shouldn't get here, the add_stmt_cost "hook" handles operations
separately.

> ....
>       case vector_stmt:
>         return ix86_vec_cost (mode,
>                               fp ? ix86_cost->addss : ix86_cost->sse_op);

I'm not sure why we use addss cost in case of FP mode.  So this would be
the only two places to try fixing.

The question is for which ops we fall back here.  The following might tell:

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 4f8380c4a58..fe1beefe732 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -25346,6 +25346,7 @@ ix86_vector_costs::add_stmt_cost (int count,
vect_cost_for_stmt kind,
            stmt_cost = ix86_cost->add;
          break;
        default:
+         gcc_unreachable ();
          break;
        }
     }


> Only addss was sped up (and apparently only in common contextes), other
> simple SSE operations are still 3 cycles.
> 
> We have
>   const int sse_op;             /* cost of cheap SSE instruction.  */
>   const int addss;              /* cost of ADDSS/SD SUBSS/SD instructions. 
> */
> 
> SSE_OP is used for integer SSE instructions, which are typically 1 cycle, so
> perhaps we want to have also sse_fp_op /* Chose of cheap SSE fp instruction.
> */
> in addition to addss.
> 
> But to be precise builtin_vectorizer cost would need to now if
> scalar/vector_stmt is additio or something else, which AFAK it doesn't

It does see it

[Bug target/119298] [15 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f

Reply via email to