Hi,
>
> gcc/ChangeLog:
>
> * config/i386/x86-tune.def (X86_TUNE_AVOID_128FMA_CHAINS): Enable for
> znver5.
> (X86_TUNE_AVOID_256FMA_CHAINS): Likewise.
> (X86_TUNE_AVOID_512FMA_CHAINS): Likewise.
This patch is also now backported to active branches.
Honza
Hi,
testing matrix multiplication benchmarks shows that FMA on a critical chain
is a perofrmance loss over separate multiply and add. While the latency of 4
is lower than multiply + add (3+2) the problem is that all values needs to
be ready before computation starts.
While on znver4 AVX512 code fa