On 06/17/2016 08:16 AM, Ilya Enkovich wrote:

I do think you've got a legitimate question though.   Ilya, can you give any
insights here based on your KNL and Haswell testing or data/insights from
the LLVM and/or ICC teams?

I have no information about LLVM.  As I said in other thread ICC uses all
options (masked epilogue, combined loop, vectorized epilogue with smaller
vector size).  It also may generate different versions (e.g. combined and
with masked epilogue) and choose dynamically depending on iterations count.
Any guidance from the ICC team on the costing model to choose between the different approaches?

I'm a bit surprised that there's enough value in doing this much work to vectorize the epilogue, but that appears to be the case...

jeff

Reply via email to