Re: [PATCH] Optimize integral lt + blend into just blend (PR target/54700)

Uros Bizjak Thu, 29 Nov 2018 08:23:01 -0800

On Thu, Nov 29, 2018 at 10:54 AM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Thu, Nov 29, 2018 at 9:00 AM Jakub Jelinek <ja...@redhat.com> wrote:
> >
> > Hi!
> >
> > The following patch optimizes
> > -       pxor    %xmm3, %xmm3
> > -       pcmpgtb %xmm0, %xmm3
> > -       movdqa  %xmm3, %xmm0
> >         pblendvb        %xmm0, %xmm1, %xmm2
> >         movdqa  %xmm2, %xmm0
> >         ret
> >
> > -       vpxor   %xmm3, %xmm3, %xmm3
> > -       vpcmpgtq        %ymm0, %ymm3, %ymm0
> > -       vpblendvb       %ymm0, %ymm2, %ymm1, %ymm0
> > +       vblendvpd       %ymm0, %ymm2, %ymm1, %ymm0
> >         ret
> >
> > etc.  As the *blendv* instructions only look at the most significant
> > bit, we don't really need to perform pcmpgt* or vpcmpgt* instructions;
> > while they set also the other bits based on the most significant one,
> > the only consumer doesn't care about those other bits.
> >
> > I believe we can't do this for floating point comparisons even with
> > -ffast-math, because -fno-signed-zeros isn't a guarantee that -0.0 won't
> > appear, just that it will appear randomly when 0.0 is wanted and vice versa,
> > and having x < 0.0 be suddenly false if x is -0.0 would IMHO break too much
> > code.
>
> I agree with the above. This would mean that a comparison x < 0.0
> would be substituted with an equivalent to a signbit (). We don't do
> this even for -ffast-math or -funsafe-math-optimizations.
>
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> > 2018-11-28  Jakub Jelinek  <ja...@redhat.com>
> >
> >         PR target/54700
> >         * config/i386/sse.md (ssebytemode): Add V16SI, V8SI and V4SI 
> > entries.
> >         (ssefltmodesuffix, ssefltvecmode): New define_mode_attrs.
> >         (*<sse4_1>_blendv<ssemodesuffix><avxsizesuffix>_lt,
> >         *<sse4_1>_blendv<ssefltmodesuffix><avxsizesuffix>_ltint,
> >         *<sse4_1_avx2>_pblendvb_lt): New define_insns.
> >
> >         * g++.target/i386/sse4_1-pr54700-1.C: New test.
> >         * g++.target/i386/sse4_1-pr54700-2.C: New test.
> >         * g++.target/i386/avx-pr54700-1.C: New test.
> >         * g++.target/i386/avx-pr54700-2.C: New test.
> >         * g++.target/i386/avx2-pr54700-1.C: New test.
> >         * g++.target/i386/avx2-pr54700-2.C: New test.
> >         * g++.target/i386/sse4_1-check.h: New file.
> >         * g++.target/i386/avx-check.h: New file.
> >         * g++.target/i386/avx2-check.h: New file.
> >         * g++.target/i386/m128-check.h: New file.
> >         * g++.target/i386/m256-check.h: New file.
> >         * g++.target/i386/avx-os-support.h: New file.
>
> OK.


On a second thought, should we rather use (pre-reload?)
define_insn_and_split to split the combination to the blend insn?

Uros.

Re: [PATCH] Optimize integral lt + blend into just blend (PR target/54700)

Reply via email to