On 06/17/2016 08:16 AM, Ilya Enkovich wrote:
I do think you've got a legitimate question though. Ilya, can you give any
insights here based on your KNL and Haswell testing or data/insights from
the LLVM and/or ICC teams?
I have no information about LLVM. As I said in other thread ICC uses all
options (masked epilogue, combined loop, vectorized epilogue with smaller
vector size). It also may generate different versions (e.g. combined and
with masked epilogue) and choose dynamically depending on iterations count.
Any guidance from the ICC team on the costing model to choose between
the different approaches?
I'm a bit surprised that there's enough value in doing this much work to
vectorize the epilogue, but that appears to be the case...
jeff