https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908

--- Comment #32 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #31)
> Created attachment 52595 [details]
> microbenchmark

The microbenchmark is used to test penalty for STFS, I've run it on CLX, and
find 1 stalled vector load is faster than 16 scalar loads, but a little bit
slower than 8 scalar loads, and greatly behind 4(or less) scalar loads. 

Num/Type        char/s  char/v  char/vn short/s short/v short/vn        int/s  
int/v   int/vn  int64/s int64/v int64/vn        float/s float/v float/vn       
doule/s double/v        double/vn
2       3.01308 5.77472 2.51209 3.01211 5.1863  2.51186 3.01316 5.87912 2.51149
3.01267 6.842   2.51195 3.01294 7.28071 2.51211 3.01343 8.28379 2.51226
4       3.57279 4.97372 2.51137 3.5156  5.18539 2.51204 3.51603 5.9016  2.51148
3.57062 7.34315 2.51127 3.56799 7.28184 2.5105  3.78715 8.78754 2.51126
8       4.524   4.97573 2.51168 4.55842 5.08339 2.51106 4.66614 6.40174 2.51107
5.32924 7.66509 2.6445  5.42716 7.78232 2.51272 5.80704 9.51308 2.64533
16      6.52829 4.83359 2.51139 6.5292  5.56546 2.51095 6.53379 6.61226 2.64337
6.69231 7.93031 2.90873 8.03185 8.45706 2.65844 8.03236 10.3075 2.91103


type/s: scalar
type/v: vector with penalty
type/vn: vector w/o penalty

Reply via email to