https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093
--- Comment #9 from rguenther at suse dot de <rguenther at suse dot de> --- On Tue, 11 Oct 2022, crazylht at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093 > > --- Comment #8 from Hongtao.liu <crazylht at gmail dot com> --- > > > > > One downside for a fully masked body is that we're using masked stores > > which usually have higher latency due to the "merge" semantics which > > means an extra memory input + merge operation. Not sure if modern > > uArchs can optimize the all-ones mask case, the vectorizer, for > Also I guess mask store won't be store forward even load is inside the mask > store. I guess the masking of the store is resolved in the load-store unit and not by splitting the operation into a load, modify, store because that cannot easily hide exceptions. So yes, a masked store in the store buffer likely cannot act as forwarding source (though the actual mask should be fully resolved there) since the actual merging will take place later.