https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115683

--- Comment #7 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao Liu from comment #6)
> (In reply to Uroš Bizjak from comment #5)
> > (In reply to Hongtao Liu from comment #0)
> > 
> > > g++: g++.target/i386/pr100637-1b.C 
> > > g++: g++.target/i386/pr100637-1w.C
> > > g++: g++.target/i386/pr103861-1.C
> > >
> > > There're extra 1 pcmpeq instruction generated in below 3 testcase for
> > > comparison of GTU, x86 doesn't support native GTU comparison, but use
> > > psubusw + pcmpeq + pcmpeq, the second pcmpeq is used to negate the mask, 
> > > and
> > > the negate can be eliminated in vcond{,u,eq} expander by just swapping
> > > if_true and if_else.
> > 
> > How to do that? The output from vec_cmpu is a mask value in the output
> > register that is used by vcond_mask as an input. I fail to see how the swap
> > of if_true and if_false operands (in vcond_mask RTX) can be communicated
> > from vec_cmpu to vcond_mask.
> 
> One possible solution is that we define the "fake" blendv pattern to help
> combine do the optimization, and then split this fake pattern back to op1 &
> mask | op2 & ~mask when !TAREGT_SSE4_1

Yes, let's go this way.

OTOH, I think removing vcondMN/vconduMN was a mistake. It is very hard to
communicate from vec_cmp{,u} to vcond_mask that we want to swap the
op_true/op_false operands, and this is quite important functionality for
targets that don't provide the complete comparison op sets.

Richi, maybe tree optimizers can perform their optimizations with vec_cmp{,u}
and vcond_mask, and at the end provide the true coditional vector move (that
calls "vcond{,u}") as a compound operation of these two operations?

Reply via email to