https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86267
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Target| |x86_64-*-*, i?86-*-* Status|UNCONFIRMED |NEW Last reconfirmed| |2018-06-22 CC| |rguenth at gcc dot gnu.org Component|tree-optimization |target Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- .optimized IL for the first testcase: f (__m256i a, __m256i b) { long long int bitmask; __m256i k; vector(4) double _1; vector(4) long long int _2; vector(4) long long int _3; int _8; vector(4) long long int _10; int _11; <bb 2> [local count: 1073741825]: k_6 = VEC_COND_EXPR <a_4(D) < b_5(D), { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>; _1 = VIEW_CONVERT_EXPR<__m256d>(k_6); _8 = __builtin_ia32_movmskpd256 (_1); _11 = _8 & 15; bitmask_9 = (long long int) _11; _2 = {bitmask_9, bitmask_9, bitmask_9, bitmask_9}; _3 = _2 & { 1, 2, 4, 8 }; _10 = VEC_COND_EXPR <_3 > { 0, 0, 0, 0 }, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>; return _10; the intel docs say _mm256_movemask_pd transfers the sign bit to the mask so any negative number will have a mask bit set and any non-negative one a mask bit not set. This somehow suggests to add x86 backend GIMPLE folding of __builtin_ia32_movmskpd256 ( [VIEW_CONVERT<>] VEC_COND_EXPR <....> ) to an appropriate VEC_COND_EXPR yielding a vector<boolean>, VCEing that to the integer typed result of __builtin_ia32_movmskpd256. That will of course only be the first step in combining the whole thing away. The & 15 could be elided based on the number of elements of the vector<bool> and the & { 1, 2, 4, 8} similarly be elided to vector<bool> operations. Overall a lot of work for a no-op ;) Did this testcase happen in real code?