[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

crazylht at gmail dot com via Gcc-bugs Thu, 10 Mar 2022 05:55:19 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908


--- Comment #32 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #31)
> Created attachment 52595 [details]
> microbenchmark

The microbenchmark is used to test penalty for STFS, I've run it on CLX, and
find 1 stalled vector load is faster than 16 scalar loads, but a little bit
slower than 8 scalar loads, and greatly behind 4(or less) scalar loads. 

Num/Type        char/s  char/v  char/vn short/s short/v short/vn        int/s  
int/v   int/vn  int64/s int64/v int64/vn        float/s float/v float/vn       
doule/s double/v        double/vn
2       3.01308 5.77472 2.51209 3.01211 5.1863  2.51186 3.01316 5.87912 2.51149
3.01267 6.842   2.51195 3.01294 7.28071 2.51211 3.01343 8.28379 2.51226
4       3.57279 4.97372 2.51137 3.5156  5.18539 2.51204 3.51603 5.9016  2.51148
3.57062 7.34315 2.51127 3.56799 7.28184 2.5105  3.78715 8.78754 2.51126
8       4.524   4.97573 2.51168 4.55842 5.08339 2.51106 4.66614 6.40174 2.51107
5.32924 7.66509 2.6445  5.42716 7.78232 2.51272 5.80704 9.51308 2.64533
16      6.52829 4.83359 2.51139 6.5292  5.56546 2.51095 6.53379 6.61226 2.64337
6.69231 7.93031 2.90873 8.03185 8.45706 2.65844 8.03236 10.3075 2.91103


type/s: scalar
type/v: vector with penalty
type/vn: vector w/o penalty

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

Reply via email to