https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #33 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Hongtao.liu from comment #32) > (In reply to Hongtao.liu from comment #31) > > Created attachment 52595 [details] > > microbenchmark > The interesting the microbenchmark didn't hit store forwarding stall on znver2 client for overlap 128bit v2di/v2df/v2si/v4si/v2sf/v4sf load, but for cray, there's regression due to STFS??? for microbenchmark, it's like leaq -1200(%rbp), %rsi ... vmovdqa %xmm0, -1088(%rbp) vmovdqa %xmm0, -1072(%rbp) ... movq %rsi, %rdi vmovupd 120(%rsi), %xmm0 vaddpd 120(%rdi), %xmm0, %xmm0 vmovupd %xmm0, (%rdx) 120(%rsi) is equal -1080(%rbp), and vmovupd 120(%rsi), %xmm0 is just half vmovdqa %xmm0, -1088(%rbp) and half vmovdqa %xmm0, -1072(%rbp). whole data: char NUM2 scalar: 2.66484 vec: 7.14645: penalty vecn: 2.26811 NUM4 scalar: 3.17188 vec: 5.79971: penalty vecn: 2.22844 NUM8 scalar: 4.06115 vec: 5.76087: penalty vecn: 2.25474 NUM16 scalar: 5.84893 vec: 5.77123: penalty vecn: 2.23649 short NUM2 scalar: 2.6982 vec: 5.98521: penalty vecn: 2.25488 NUM4 scalar: 3.15688 vec: 5.98339: penalty vecn: 2.25535 NUM8 scalar: 4.10435 vec: 5.98285: penalty vecn: 2.25676 NUM16 scalar: 5.92615 vec: 5.77799: penalty vecn: 2.24804 int NUM2 scalar: 2.72005 vec: 2.46749: no!! vecn: 2.25704 NUM4 scalar: 3.18113 vec: 2.46506: no!! vecn: 2.26846 NUM8 scalar: 4.01626 vec: 6.67516: penalty vecn: 2.27382 NUM16 scalar: 5.92935 vec: 7.17056: penalty vecn: 10.0371 int64_t NUM2 scalar: 2.67302 vec: 2.48949: no!! vecn: 2.24273 NUM4 scalar: 3.17415 vec: 7.80522: penalty vecn: 2.25004 NUM8 scalar: 4.07681 vec: 8.31397: penalty vecn: 10.0378 NUM16 scalar: 5.81931 vec: 7.85716: penalty vecn: 10.863 float NUM2 scalar: 2.67386 vec: 2.48: no!! vecn: 2.26215 NUM4 scalar: 3.17401 vec: 2.48121: no!! vecn: 2.23051 NUM8 scalar: 4.05976 vec: 7.16108: penalty vecn: 2.27791 NUM16 scalar: 6.08089 vec: 7.61818: penalty vecn: 10.6009 double NUM2 scalar: 2.67811 vec: 2.46635: no!! vecn: 2.22982 NUM4 scalar: 3.19169 vec: 8.2489: penalty vecn: 2.25086 NUM8 scalar: 4.05351 vec: 8.70083: penalty