https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115683
--- Comment #6 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- (In reply to Uroš Bizjak from comment #5) > (In reply to Hongtao Liu from comment #0) > > > g++: g++.target/i386/pr100637-1b.C > > g++: g++.target/i386/pr100637-1w.C > > g++: g++.target/i386/pr103861-1.C > > > > There're extra 1 pcmpeq instruction generated in below 3 testcase for > > comparison of GTU, x86 doesn't support native GTU comparison, but use > > psubusw + pcmpeq + pcmpeq, the second pcmpeq is used to negate the mask, and > > the negate can be eliminated in vcond{,u,eq} expander by just swapping > > if_true and if_else. > > How to do that? The output from vec_cmpu is a mask value in the output > register that is used by vcond_mask as an input. I fail to see how the swap > of if_true and if_false operands (in vcond_mask RTX) can be communicated > from vec_cmpu to vcond_mask. One possible solution is that we define the "fake" blendv pattern to help combine do the optimization, and then split this fake pattern back to op1 & mask | op2 & ~mask when !TAREGT_SSE4_1