https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #44 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Hongtao.liu from comment #43) > One thing I found by experiments: > Insert 64 vaddps %xmm18, %xmm19, %xmm20(no dependence between each other, > just emulate for pipeline) before stalled load, stlf stall case is as fast > as no stall cases on CLX. I guess this is "distance" you mean. > But there's still event for STLF blocks, guess processor scheduler helps here.