[Bug target/119298] [15 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f

rguenther at suse dot de via Gcc-bugs Wed, 09 Apr 2025 06:15:05 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298


--- Comment #13 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 9 Apr 2025, hubicka at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119298
> 
> --- Comment #12 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
> > Btw, it was your r8-4018-gf6fd8f2bd4e9a9 which added the FP vs. non-FP 
> > difference.
> 
> Yep, I know.  With that patch I mostly wanted to limit redundancy of the
> tables. The int/Fp difference was mostly based on the observation that most of
> integer SSE operations (for example padd) take 1 cycle, while most of FP
> operations (like addss) take 3 cycles. My simplified understanding is that FP
> operations are usually pipelined to 3 cycles (since Pentium to today) because
> they include normalization, operation and rounding. The cost table is 
> basically
> meant to have "typical cost" (sse_op and addss) along with all important
> exceptions (mul, div, fma, sqrt).
> 
> Zen5 is, I think, first CPU where addss has different timing than other basic
> FP arithmetic which makes addss itself an exception.  (Back then, I should 
> have
> renamed addss cost and make the comment more descriptive.)
> 
> So based on this adding sse_fp_op (set to 3 on Zen5 and same cost as addss
> everywhere else) for "typical FP operation" and keep addss cost for actual FP
> add/sub (I will need to benchmark if sub is also 2 cycles; I am not sure about
> that) IMO makes sense.
> 
> But indeed we currently use addss for conversions and other stuff which is not
> necessarily good and we may want to add more entries for these.  Do you know
> what are important ones and ought to be fixed?

I'm not sure which ones are important, but we should try to get rid
of the fallback to call the "old" hook from add_stmt_cost so it's more
obvious from which context we come.  But I also expect this are to
change quite a bit next stage1 where I hope to revamp vectorizer
costing.

> I am OK with using addss cost of 3 for trunk&release branches and make this
> more precise next stage1.

That's what we use now?  But I still don't understand why exactly
538.imagick_r regresses.

[Bug target/119298] [15 Regression] 538.imagick_r is faster when compiled with GCC 14.2 and -Ofast -flto -march=native than with master on Zen5 since r15-3441-g4292297a0f938f

Reply via email to