> Hi,
> this patch enables logic which avoid FMA for matrix multiplicaiton loop
> for 256 bit vectors. The underlying issue is same as with znver1. While
> combined latency of mutliply and add operations is slower than FMA, the
> dependency chain in matrix multiplication depends only on additions
>
Hi,
this patch enables logic which avoid FMA for matrix multiplicaiton loop
for 256 bit vectors. The underlying issue is same as with znver1. While
combined latency of mutliply and add operations is slower than FMA, the
dependency chain in matrix multiplication depends only on additions
that are fa