On Thu, Nov 29, 2018 at 10:54 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Thu, Nov 29, 2018 at 9:00 AM Jakub Jelinek <ja...@redhat.com> wrote: > > > > Hi! > > > > The following patch optimizes > > - pxor %xmm3, %xmm3 > > - pcmpgtb %xmm0, %xmm3 > > - movdqa %xmm3, %xmm0 > > pblendvb %xmm0, %xmm1, %xmm2 > > movdqa %xmm2, %xmm0 > > ret > > > > - vpxor %xmm3, %xmm3, %xmm3 > > - vpcmpgtq %ymm0, %ymm3, %ymm0 > > - vpblendvb %ymm0, %ymm2, %ymm1, %ymm0 > > + vblendvpd %ymm0, %ymm2, %ymm1, %ymm0 > > ret > > > > etc. As the *blendv* instructions only look at the most significant > > bit, we don't really need to perform pcmpgt* or vpcmpgt* instructions; > > while they set also the other bits based on the most significant one, > > the only consumer doesn't care about those other bits. > > > > I believe we can't do this for floating point comparisons even with > > -ffast-math, because -fno-signed-zeros isn't a guarantee that -0.0 won't > > appear, just that it will appear randomly when 0.0 is wanted and vice versa, > > and having x < 0.0 be suddenly false if x is -0.0 would IMHO break too much > > code. > > I agree with the above. This would mean that a comparison x < 0.0 > would be substituted with an equivalent to a signbit (). We don't do > this even for -ffast-math or -funsafe-math-optimizations. > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > > > 2018-11-28 Jakub Jelinek <ja...@redhat.com> > > > > PR target/54700 > > * config/i386/sse.md (ssebytemode): Add V16SI, V8SI and V4SI > > entries. > > (ssefltmodesuffix, ssefltvecmode): New define_mode_attrs. > > (*<sse4_1>_blendv<ssemodesuffix><avxsizesuffix>_lt, > > *<sse4_1>_blendv<ssefltmodesuffix><avxsizesuffix>_ltint, > > *<sse4_1_avx2>_pblendvb_lt): New define_insns. > > > > * g++.target/i386/sse4_1-pr54700-1.C: New test. > > * g++.target/i386/sse4_1-pr54700-2.C: New test. > > * g++.target/i386/avx-pr54700-1.C: New test. > > * g++.target/i386/avx-pr54700-2.C: New test. > > * g++.target/i386/avx2-pr54700-1.C: New test. > > * g++.target/i386/avx2-pr54700-2.C: New test. > > * g++.target/i386/sse4_1-check.h: New file. > > * g++.target/i386/avx-check.h: New file. > > * g++.target/i386/avx2-check.h: New file. > > * g++.target/i386/m128-check.h: New file. > > * g++.target/i386/m256-check.h: New file. > > * g++.target/i386/avx-os-support.h: New file. > > OK.
On a second thought, should we rather use (pre-reload?) define_insn_and_split to split the combination to the blend insn? Uros.