https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100173
--- Comment #2 from Hongtao.liu <crazylht at gmail dot com> --- > but yes, cselim will also sink the first store, moving it across the > scalar compute in the block. I might note that ideally we'd sink > all the compute as well and end up with just a conditional load of > either pIn1->m_esState or pIn2_89->m_esState. That might then allow > scheduling to recover the original performance. > I want to clasify this regression is not related to 2 sinked stores, it just trigger some micro-architecture bound. Also w/o -fvect-cost-model=very-cheap, it can be 2-3x faster, the tripper count is constant, so i wonder why very-cheap cost model doesn't vectorize this loop?