[Bug target/92492] [AVX512F] Icc generate much better code for loop vectorization

2019-11-13 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92492 --- Comment #3 from Hongtao.liu --- (In reply to Richard Biener from comment #2) > ICC also uses effectively two vector sizes, v8qi and v8hi AFAICS? But > why does it use %ymm then... I think it's v8qi and v8si, icc use vpmovzxbd not vpmovzxbw.

[Bug target/92492] [AVX512F] Icc generate much better code for loop vectorization

2019-11-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92492 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Status|

[Bug target/92492] [AVX512F] Icc generate much better code for loop vectorization

2019-11-13 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92492 --- Comment #1 from Hongtao.liu --- Much more simple case, exclude disturb of point alias and unknown loop count cat test.c: typedef unsigned char uint8_t; static inline uint8_t x264_clip_uint8( int x ) { return x&(~63) ? (-x)>>7 : x; } voi