https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98348
--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Jakub Jelinek from comment #8) > Created attachment 49806 [details] > gcc11-pr98348.patch > > So, if we go for GCC11 the way of pre-reload define_insn_and_split, this is > some incremental untested progress on your patch (just the sse.md part of > it). > Changes: > 1) it is undesirable to put SUBREGs on the SET_DEST side, as it prevents > other optimizations later on > 2) I don't see the point on the TARGET_AVX512BW ||, the insn in the end is > plain AVX or AVX2 or SSE4* etc. one > 3) handles also the const0 vector_all_ones order > 4) for commutative cases allows any operand order, for others ensures the > right The bellow pattern should be equivilent to const0 vector_all_ones order, but the generic part didn't simplify it, so i made a bit adjustment to those patterns, also some changes in ix86_expand_sse_movcc to generate common NOT operator instead of gen_knot<mode> which has UNSPEC_MASKOP inside, so the combine can do the right thing. Successfully matched this instruction: (set (reg:V16QI 82 [ <retval> ]) (vec_merge:V16QI (const_vector:V16QI [ (const_int -1 [0xffffffffffffffff]) repeated x16 ]) (const_vector:V16QI [ (const_int 0 [0]) repeated x16 ]) (not:HI (unspec:HI [ (reg:V16QI 89) (reg:V16QI 90) (const_int 4 [0x4]) ] UNSPEC_PCMP)))) > one of the operands is register > 5) handles also the LE case by swapping the comparison operands > > The patch doesn't handle the cases where based on the comparison one sets up > floating vectors, as can be seen e.g. in: > typedef float V128 __attribute__ ((vector_size(16))); > typedef float V256 __attribute__ ((vector_size(32))); > typedef float V512 __attribute__ ((vector_size(64))); > > V128 > foo (V128 x) > { > const union U { unsigned u; float f; } u = { -1U }; > return x > 0.0f ? u.f : 0.0f; > } > > V256 > bar (V256 x) > { > const union U { unsigned u; float f; } u = { -1U }; > return x > 0.0f ? u.f : 0.0f; > } I'm adding a new predicate named float_vector_all_ones_operand and corresponding define_insn_and_split to handle it. Successfully matched this instruction: (set (reg:V4SF 82 [ <retval> ]) (vec_merge:V4SF (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 S16 A128]) (const_vector:V4SF [ (const_double:SF 0.0 [0x0.0p+0]) repeated x4 ]) (unspec:QI [ (reg:V4SF 84) (reg:V4SF 88) (const_int 1 [0x1]) ] UNSPEC_PCMP)))