https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093

--- Comment #8 from Hongtao.liu <crazylht at gmail dot com> ---

> 
> One downside for a fully masked body is that we're using masked stores
> which usually have higher latency due to the "merge" semantics which
> means an extra memory input + merge operation.  Not sure if modern
> uArchs can optimize the all-ones mask case, the vectorizer, for
Also I guess mask store won't be store forward even load is inside the mask
store.

Reply via email to