https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
<bb 2> [local count: 1073741824]:
_2 = VIEW_CONVERT_EXPR<__v16qi>(x_3(D));
_6 = _2 == { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
_7 = VIEW_CONVERT_EXPR<vector(16) signed char>(_6);
_4 = __builtin_ia32_pmovmskb128 (_7);
_5 = _4 == 65535;
return _5;
so likely one reason is the builtin and later UNSPEC for the movemask
operation.
combine does try the following though
Trying 8, 11, 13 -> 14:
8: r92:V16QI=r89:V16QI==r96:V2DI#0
REG_DEAD r96:V2DI
REG_DEAD r89:V16QI
11: r88:SI=unspec[r92:V16QI] 44
REG_DEAD r92:V16QI
13: flags:CCZ=cmp(r88:SI,0xffff)
REG_DEAD r88:SI
14: r95:QI=flags:CCZ==0
REG_DEAD flags:CCZ
Failed to match this instruction:
(set (reg:QI 95)
(eq:QI (unspec:SI [
(eq:V16QI (reg:V16QI 89)
(subreg:V16QI (reg:V2DI 96) 0))
] UNSPEC_MOVMSK)
(const_int 65535 [0xffff])))
of course I have my doubts the pattern is a useful one to optimize.