PING^2: [PATCH] i386: Generate standard floating point scalar operation patterns

H.J. Lu Tue, 18 Jun 2019 09:01:34 -0700

On Mon, Jun 3, 2019 at 3:50 PM H.J. Lu <hjl.to...@gmail.com> wrote:
>
> On Tue, May 21, 2019 at 8:54 AM H.J. Lu <hjl.to...@gmail.com> wrote:
> >
> > On Wed, May 15, 2019 at 2:29 PM Richard Sandiford
> > <richard.sandif...@arm.com> wrote:
> > >
> > > "H.J. Lu" <hjl.to...@gmail.com> writes:
> > > > On Thu, Feb 7, 2019 at 9:49 AM H.J. Lu <hjl.to...@gmail.com> wrote:
> > > >>
> > > >> Standard scalar operation patterns which preserve the rest of the 
> > > >> vector
> > > >> look like
> > > >>
> > > >>      (vec_merge:V2DF
> > > >>        (vec_duplicate:V2DF
> > > >>          (op:DF (vec_select:DF (reg/v:V2DF 85 [ x ])
> > > >>                 (parallel [ (const_int 0 [0])]))
> > > >>          (reg:DF 87))
> > > >>        (reg/v:V2DF 85 [ x ])
> > > >>        (const_int 1 [0x1])]))
> > > >>
> > > >> Add such pattens to i386 backend and convert VEC_CONCAT patterns to
> > > >> standard standard scalar operation patterns.
> > >
> > > It looks like there's some variety in the patterns used, e.g.:
> > >
> > > (define_insn 
> > > "<sse>_vm<code><mode>3<mask_scalar_name><round_saeonly_scalar_name>"
> > >   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
> > >         (vec_merge:VF_128
> > >           (smaxmin:VF_128
> > >             (match_operand:VF_128 1 "register_operand" "0,v")
> > >             (match_operand:VF_128 2 "vector_operand" 
> > > "xBm,<round_saeonly_scalar_constraint>"))
> > >          (match_dup 1)
> > >          (const_int 1)))]
> > >   "TARGET_SSE"
> > >   "@
> > >    <maxmin_float><ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2}
> > >    
> > > v<maxmin_float><ssescalarmodesuffix>\t{<round_saeonly_scalar_mask_op3>%2, 
> > > %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, 
> > > %<iptr>2<round_saeonly_scalar_mask_op3>}"
> > >   [(set_attr "isa" "noavx,avx")
> > >    (set_attr "type" "sse")
> > >    (set_attr "btver2_sse_attr" "maxmin")
> > >    (set_attr "prefix" "<round_saeonly_scalar_prefix>")
> > >    (set_attr "mode" "<ssescalarmode>")])
> > >
> > > makes the operand a full vector operation, which seems simpler.
> >
> > This pattern is used to implement scalar smaxmin intrinsics.
> >
> > > The above would then be:
> > >
> > >       (vec_merge:V2DF
> > >         (op:V2DF
> > >           (reg:V2DF 85)
> > >           (vec_duplicate:V2DF (reg:DF 87)))
> > >         (reg/v:V2DF 85 [ x ])
> > >         (const_int 1 [0x1])]))
> > >
> > > I guess technically the two have different faulting behaviour though,
> > > since the smaxmin gets applied to all elements, not just element 0.
> >
> > This is the issue.   We don't use the correct mode for scalar instructions:
> >
> > ---
> > #include <immintrin.h>
> >
> > __m128d
> > foo1 (__m128d x, double *p)
> > {
> >   __m128d y = _mm_load_sd (p);
> >   return _mm_max_pd (x, y);
> > }
> > ---
> >
> > movq (%rdi), %xmm1
> > maxpd %xmm1, %xmm0
> > ret
> >
> >
> > Here is the updated patch to add standard floating point scalar
> > operation patterns to i386 backend.    Then we can do
> >
> > ---
> > #include <immintrin.h>
> >
> > extern __inline __m128d __attribute__((__gnu_inline__,
> > __always_inline__, __artificial__))
> > _new_mm_max_pd (__m128d __A, __m128d __B)
> > {
> >   __A[0] = __A[0] > __B[0] ? __A[0] : __B[0];
> >   return __A;
> > }
> >
> > __m128d
> > foo2 (__m128d x, double *p)
> > {
> >   __m128d y = _mm_load_sd (p);
> >   return _new_mm_max_pd (x, y);
> > }
> >
> > maxsd (%rdi), %xmm0
> > ret
> >
> > We should use generic vector operations to implement i386 intrinsics
> > as much as we can.
> >
> > > The patch seems very specific.  E.g. why just PLUS, MINUS, MULT and DIV?
> >
> > This patch only adds  +, -, *, /, > and <.    We can add more if there
> > are testcases
> > for them.
> >
> > > Thanks,
> > > Richard
> > >
> > >
> > > >>
> > > >> gcc/
> > > >>
> > > >>         PR target/54855
> > > >>         * simplify-rtx.c (simplify_binary_operation_1): Convert
> > > >>         VEC_CONCAT patterns to standard standard scalar operation
> > > >>         patterns.
> > > >>         * config/i386/sse.md (*<sse>_vm<plusminus_insn><mode>3): New.
> > > >>         (*<sse>_vm<multdiv_mnemonic><mode>3): Likewise.
> > > >>
> > > >> gcc/testsuite/
> > > >>
> > > >>         PR target/54855
> > > >>         * gcc.target/i386/pr54855-1.c: New test.
> > > >>         * gcc.target/i386/pr54855-2.c: Likewise.
> > > >>         * gcc.target/i386/pr54855-3.c: Likewise.
> > > >>         * gcc.target/i386/pr54855-4.c: Likewise.
> > > >>         * gcc.target/i386/pr54855-5.c: Likewise.
> > > >>         * gcc.target/i386/pr54855-6.c: Likewise.
> > > >>         * gcc.target/i386/pr54855-7.c: Likewise.
> > > >
> > > > PING:
> > > >
> > > > https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00398.html
> >
> > Thanks.
> >
>
> PING:
>
> https://gcc.gnu.org/ml/gcc-patches/2019-05/msg01416.html
>


PING.

-- 
H.J.

PING^2: [PATCH] i386: Generate standard floating point scalar operation patterns

Reply via email to