Re: Zen5 tuning part 1: avoid FMA chains

2024-09-30 Thread Jan Hubicka
Hi, > > gcc/ChangeLog: > > * config/i386/x86-tune.def (X86_TUNE_AVOID_128FMA_CHAINS): Enable for > znver5. > (X86_TUNE_AVOID_256FMA_CHAINS): Likewise. > (X86_TUNE_AVOID_512FMA_CHAINS): Likewise. This patch is also now backported to active branches. Honza

Zen5 tuning part 1: avoid FMA chains

2024-09-03 Thread Jan Hubicka
Hi, testing matrix multiplication benchmarks shows that FMA on a critical chain is a perofrmance loss over separate multiply and add. While the latency of 4 is lower than multiply + add (3+2) the problem is that all values needs to be ready before computation starts. While on znver4 AVX512 code fa