https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908

--- Comment #33 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #32)
> (In reply to Hongtao.liu from comment #31)
> > Created attachment 52595 [details]
> > microbenchmark
> 
The interesting the microbenchmark didn't hit store forwarding stall on znver2
client for overlap 128bit v2di/v2df/v2si/v4si/v2sf/v4sf load, but for cray,
there's regression due to STFS???

for microbenchmark, it's like

        leaq    -1200(%rbp), %rsi
...
        vmovdqa %xmm0, -1088(%rbp)
        vmovdqa %xmm0, -1072(%rbp)
...
        movq    %rsi, %rdi


        vmovupd 120(%rsi), %xmm0
        vaddpd  120(%rdi), %xmm0, %xmm0
        vmovupd %xmm0, (%rdx)


120(%rsi) is equal -1080(%rbp), and vmovupd 120(%rsi), %xmm0 is just half 
vmovdqa %xmm0, -1088(%rbp) and half vmovdqa %xmm0, -1072(%rbp).


whole data:

char
NUM2
scalar: 2.66484
   vec: 7.14645: penalty
  vecn: 2.26811
NUM4
scalar: 3.17188
   vec: 5.79971: penalty
  vecn: 2.22844
NUM8
scalar: 4.06115
   vec: 5.76087: penalty
  vecn: 2.25474
NUM16
scalar: 5.84893
   vec: 5.77123: penalty
  vecn: 2.23649
short
NUM2
scalar: 2.6982
   vec: 5.98521: penalty
  vecn: 2.25488
NUM4
scalar: 3.15688
   vec: 5.98339: penalty
  vecn: 2.25535
NUM8
scalar: 4.10435
   vec: 5.98285: penalty
  vecn: 2.25676
NUM16
scalar: 5.92615
   vec: 5.77799: penalty
  vecn: 2.24804
int
NUM2
scalar: 2.72005
   vec: 2.46749: no!!
  vecn: 2.25704
NUM4
scalar: 3.18113
   vec: 2.46506: no!!
  vecn: 2.26846
NUM8
scalar: 4.01626
   vec: 6.67516: penalty
  vecn: 2.27382
NUM16
scalar: 5.92935
   vec: 7.17056: penalty
  vecn: 10.0371
int64_t
NUM2
scalar: 2.67302
   vec: 2.48949: no!!
  vecn: 2.24273
NUM4
scalar: 3.17415
   vec: 7.80522: penalty
  vecn: 2.25004
NUM8
scalar: 4.07681
   vec: 8.31397: penalty
  vecn: 10.0378
NUM16
scalar: 5.81931
   vec: 7.85716: penalty
  vecn: 10.863
float
NUM2
scalar: 2.67386
   vec: 2.48: no!!
  vecn: 2.26215
NUM4
scalar: 3.17401
   vec: 2.48121: no!!
  vecn: 2.23051
NUM8
scalar: 4.05976
   vec: 7.16108: penalty
  vecn: 2.27791
NUM16
scalar: 6.08089
   vec: 7.61818: penalty
  vecn: 10.6009
double
NUM2
scalar: 2.67811
   vec: 2.46635: no!!
  vecn: 2.22982
NUM4
scalar: 3.19169
   vec: 8.2489: penalty
  vecn: 2.25086
NUM8
scalar: 4.05351
   vec: 8.70083: penalty

Reply via email to