Re: Update X86_TUNE_AVOID_256FMA_CHAINS for znver2

2019-07-30 Thread Jan Hubicka
> Hi, > this patch enables logic which avoid FMA for matrix multiplicaiton loop > for 256 bit vectors. The underlying issue is same as with znver1. While > combined latency of mutliply and add operations is slower than FMA, the > dependency chain in matrix multiplication depends only on additions >

Update X86_TUNE_AVOID_256FMA_CHAINS for znver2

2019-07-23 Thread Jan Hubicka
Hi, this patch enables logic which avoid FMA for matrix multiplicaiton loop for 256 bit vectors. The underlying issue is same as with znver1. While combined latency of mutliply and add operations is slower than FMA, the dependency chain in matrix multiplication depends only on additions that are fa