https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298
--- Comment #12 from Jan Hubicka <hubicka at gcc dot gnu.org> --- > Btw, it was your r8-4018-gf6fd8f2bd4e9a9 which added the FP vs. non-FP > difference. Yep, I know. With that patch I mostly wanted to limit redundancy of the tables. The int/Fp difference was mostly based on the observation that most of integer SSE operations (for example padd) take 1 cycle, while most of FP operations (like addss) take 3 cycles. My simplified understanding is that FP operations are usually pipelined to 3 cycles (since Pentium to today) because they include normalization, operation and rounding. The cost table is basically meant to have "typical cost" (sse_op and addss) along with all important exceptions (mul, div, fma, sqrt). Zen5 is, I think, first CPU where addss has different timing than other basic FP arithmetic which makes addss itself an exception. (Back then, I should have renamed addss cost and make the comment more descriptive.) So based on this adding sse_fp_op (set to 3 on Zen5 and same cost as addss everywhere else) for "typical FP operation" and keep addss cost for actual FP add/sub (I will need to benchmark if sub is also 2 cycles; I am not sure about that) IMO makes sense. But indeed we currently use addss for conversions and other stuff which is not necessarily good and we may want to add more entries for these. Do you know what are important ones and ought to be fixed? I am OK with using addss cost of 3 for trunk&release branches and make this more precise next stage1.