https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115683
--- Comment #7 from Uroš Bizjak <ubizjak at gmail dot com> --- (In reply to Hongtao Liu from comment #6) > (In reply to Uroš Bizjak from comment #5) > > (In reply to Hongtao Liu from comment #0) > > > > > g++: g++.target/i386/pr100637-1b.C > > > g++: g++.target/i386/pr100637-1w.C > > > g++: g++.target/i386/pr103861-1.C > > > > > > There're extra 1 pcmpeq instruction generated in below 3 testcase for > > > comparison of GTU, x86 doesn't support native GTU comparison, but use > > > psubusw + pcmpeq + pcmpeq, the second pcmpeq is used to negate the mask, > > > and > > > the negate can be eliminated in vcond{,u,eq} expander by just swapping > > > if_true and if_else. > > > > How to do that? The output from vec_cmpu is a mask value in the output > > register that is used by vcond_mask as an input. I fail to see how the swap > > of if_true and if_false operands (in vcond_mask RTX) can be communicated > > from vec_cmpu to vcond_mask. > > One possible solution is that we define the "fake" blendv pattern to help > combine do the optimization, and then split this fake pattern back to op1 & > mask | op2 & ~mask when !TAREGT_SSE4_1 Yes, let's go this way. OTOH, I think removing vcondMN/vconduMN was a mistake. It is very hard to communicate from vec_cmp{,u} to vcond_mask that we want to swap the op_true/op_false operands, and this is quite important functionality for targets that don't provide the complete comparison op sets. Richi, maybe tree optimizers can perform their optimizations with vec_cmp{,u} and vcond_mask, and at the end provide the true coditional vector move (that calls "vcond{,u}") as a compound operation of these two operations?