https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298

--- Comment #12 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
> Btw, it was your r8-4018-gf6fd8f2bd4e9a9 which added the FP vs. non-FP 
> difference.

Yep, I know.  With that patch I mostly wanted to limit redundancy of the
tables. The int/Fp difference was mostly based on the observation that most of
integer SSE operations (for example padd) take 1 cycle, while most of FP
operations (like addss) take 3 cycles. My simplified understanding is that FP
operations are usually pipelined to 3 cycles (since Pentium to today) because
they include normalization, operation and rounding. The cost table is basically
meant to have "typical cost" (sse_op and addss) along with all important
exceptions (mul, div, fma, sqrt).

Zen5 is, I think, first CPU where addss has different timing than other basic
FP arithmetic which makes addss itself an exception.  (Back then, I should have
renamed addss cost and make the comment more descriptive.)

So based on this adding sse_fp_op (set to 3 on Zen5 and same cost as addss
everywhere else) for "typical FP operation" and keep addss cost for actual FP
add/sub (I will need to benchmark if sub is also 2 cycles; I am not sure about
that) IMO makes sense.

But indeed we currently use addss for conversions and other stuff which is not
necessarily good and we may want to add more entries for these.  Do you know
what are important ones and ought to be fixed?

I am OK with using addss cost of 3 for trunk&release branches and make this
more precise next stage1.

Reply via email to