https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109955
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- One thing I see is -(insn 11 10 15 2 (set (subreg:V16QI (reg:V2DI 83 [ <retval> ]) 0) - (unspec:V16QI [ - (reg:V16QI 92) - (reg:V16QI 91) - (lt:V16QI (reg:V16QI 90) - (const_vector:V16QI [ - (const_int 0 [0]) repeated x16 - ])) - ] UNSPEC_BLENDV)) "/space/rguenther/src/gcc/gcc/testsuite/gcc.target/i386/sse4_1-pr99908.c":22:10 discrim 1 7431 {*sse4_1_pblendvb_lt} (nil))))) vs +(insn 8 5 9 2 (set (reg:V16QI 89) + (const_vector:V16QI [ + (const_int -1 [0xffffffffffffffff]) repeated x16 + ])) "/spc/abuild/rguenther/obj-gcc-g/gcc/include/smmintrin.h":181:20 1838 {movv16qi_internal} + (nil)) +(insn 9 8 11 2 (set (reg:V16QI 90) + (gt:V16QI (reg:V16QI 92) + (reg:V16QI 89))) "/spc/abuild/rguenther/obj-gcc-g/gcc/include/smmintrin.h":181:20 6749 {*sse2_gtv16qi3} (expr_list:REG_DEAD (reg:V16QI 92) + (expr_list:REG_DEAD (reg:V16QI 89) + (nil)))) +(note 11 9 12 2 NOTE_INSN_DELETED) +(insn 12 11 16 2 (set (subreg:V16QI (reg:V2DI 84 [ <retval> ]) 0) + (unspec:V16QI [ + (reg:V16QI 93) + (reg:V16QI 94) + (reg:V16QI 90) + ] UNSPEC_BLENDV)) "/space/rguenther/src/gcc/gcc/testsuite/gcc.target/i386/sse4_1-pr99908.c":22:10 discrim 1 7429 {sse4_1_pblendvb} + (expr_list:REG_DEAD (reg:V16QI 93) + (expr_list:REG_DEAD (reg:V16QI 90) + (expr_list:REG_DEAD (reg:V16QI 94) (nil))))) after the combiner which seems to be a missing simplification of (insn 8 5 9 2 (set (reg:V16QI 89) (const_vector:V16QI [ (const_int -1 [0xffffffffffffffff]) repeated x16 ])) (insn 9 8 11 2 (set (reg:V16QI 90) (gt:V16QI (reg:V16QI 92) (reg:V16QI 89))) to (lt:V16QI (reg:V16QI 90) (const_vector:V16QI [ (const_int 0 [0]) repeated x16 ]) Trying 8 -> 9: 8: r89:V16QI=const_vector 9: r90:V16QI=r92:V16QI>r89:V16QI REG_DEAD r92:V16QI REG_DEAD r89:V16QI Failed to match this instruction: (set (reg:V16QI 90) (gt:V16QI (reg:V16QI 92) (const_vector:V16QI [ (const_int -1 [0xffffffffffffffff]) repeated x16 ]))) Trying 8, 9 -> 12: 8: r89:V16QI=const_vector 9: r90:V16QI=r92:V16QI>r89:V16QI REG_DEAD r92:V16QI REG_DEAD r89:V16QI 12: r84:V2DI#0=unspec[r93:V16QI,r94:V16QI,r90:V16QI] 47 REG_DEAD r93:V16QI REG_DEAD r90:V16QI REG_DEAD r94:V16QI Failed to match this instruction: (set (subreg:V16QI (reg:V2DI 84 [ <retval> ]) 0) (unspec:V16QI [ (reg:V16QI 93) (reg:V16QI 94) (gt:V16QI (reg:V16QI 92) (const_vector:V16QI [ (const_int -1 [0xffffffffffffffff]) repeated x16 ])) ] UNSPEC_BLENDV)) not sure if the lt is a standalone thing. Maybe we just need a define-insn-and-split for _gt as well. All those seem to be somewhat tuned to the exact way RTL expansion works when the vcond patterns are there. Getting rid of vcond* (but not vcond_mask) would allow quite some simplification in middle-end code and the vectorizer.