On Tue, Oct 24, 2017 at 4:46 PM, Jakub Jelinek <ja...@redhat.com> wrote:
> On Tue, Oct 24, 2017 at 05:44:44AM -0700, H.J. Lu wrote:
>> > What I can see from config/atom.md:
>> > ;; if palignr or psrldq
>> > (define_insn_reservation  "atom_sseishft_2" 1
>> >   (and (eq_attr "cpu" "atom")
>> >        (and (eq_attr "type" "sseishft")
>> >             (and (eq_attr "atom_unit" "sishuf")
>> >                  (match_operand 2 "immediate_operand"))))
>> >   "atom-simple-0")
>> >
>> > This leads back to initial commit of atom.md.
>> > So, discrimination of psrldq and pslldq looks intentional.
>> >
>> > On the over hand, I see in Software Optimization Guide, Table 14-2 that
>> > PSRLDQ and PSLLDQ occupy same line which directs both insns to port-0 (p 
>> > 14-18).
>> > So, looking from that point, definition for PSLLDQ which allow either of 
>> > port-0
>> > and port-1 looks wrong (atom-simple-either reservation).
>> >
>> > In absence of other information, I'd play on safe side and leave things as 
>> > they
>> > occur right now.
>> >
>>
>> I prefer to leave atom.md ASIS.  As for (set_attr "atom_unit"
>> "sishuf"), it was added
>> for
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44615
>>
>> You can drop (set_attr "atom_unit" "sishuf") if gcc.target/i386/sse2-vec-2a.c
>> still compiles.
>
> No, it was added earlier than that, that PR was about insns with psrldq with
> implicit immediate (which don't have a CONST_INT operands[2]).  This insn
> does have it, the testcase passes regardless of whether sishuf or other is
> used, it is purely a tuning thing.
>
> In any case, here is an updated patch that just preserves the status quo
> (psrldq having the sishuf unit, pslldq not) using a simple code attribute.

Agner Fog's tables confirm Jakub's observation:

PSLL/RL/RAW/D/Q (x)mm,(x)mm 2 FP0 5 5
PSLL/RL/RAW/D/Q (x)xmm,i 1 FP0 1 1
PSLL/RLDQ xmm,i 1 FP0 1 1

I fail to see how could left and right shifts use different units.
Since the test passes, let's change pslldq to use sishuf unit. There
is no better alternative from the list of units.

> 2017-10-24  Jakub Jelinek  <ja...@redhat.com>
>
>         PR target/82370
>         * config/i386/sse.md (VIMAX_AVX2): Remove V4TImode.
>         (VIMAX_AVX2_AVX512BW, VIMAX_AVX512VL): New mode iterators.
>         (vec_shl_<mode>): Remove unused expander.
>         (avx512bw_<shift_insn><mode>3): New define_insn.
>         (atom_shift_unit): New code iterator.
>         (<sse2_avx2>_ashl<mode>3, <sse2_avx2>_lshr<mode>3): Replaced by ...
>         (<sse2_avx2>_<shift_insn><mode>3): ... this.  New define_insn.
>
>         * gcc.target/i386/pr82370.c: New test.

OK with the change od pslldq's unit to sishuf.

Thanks,
Uros.

> --- gcc/config/i386/sse.md.jj   2017-10-20 16:30:35.286208652 +0200
> +++ gcc/config/i386/sse.md      2017-10-24 16:29:54.848934888 +0200
> @@ -371,10 +371,17 @@ (define_mode_iterator V16FI
>    [V16SF V16SI])
>
>  ;; ??? We should probably use TImode instead.
> -(define_mode_iterator VIMAX_AVX2
> +(define_mode_iterator VIMAX_AVX2_AVX512BW
>    [(V4TI "TARGET_AVX512BW") (V2TI "TARGET_AVX2") V1TI])
>
> -;; ??? This should probably be dropped in favor of VIMAX_AVX2.
> +;; Suppose TARGET_AVX512BW as baseline
> +(define_mode_iterator VIMAX_AVX512VL
> +  [V4TI (V2TI "TARGET_AVX512VL") (V1TI "TARGET_AVX512VL")])
> +
> +(define_mode_iterator VIMAX_AVX2
> +  [(V2TI "TARGET_AVX2") V1TI])
> +
> +;; ??? This should probably be dropped in favor of VIMAX_AVX2_AVX512BW.
>  (define_mode_iterator SSESCALARMODE
>    [(V4TI "TARGET_AVX512BW") (V2TI "TARGET_AVX2") TI])
>
> @@ -10778,9 +10785,9 @@ (define_insn "<shift_insn><mode>3<mask_n
>     (set_attr "mode" "<sseinsnmode>")])
>
>
> -(define_expand "vec_shl_<mode>"
> +(define_expand "vec_shr_<mode>"
>    [(set (match_dup 3)
> -       (ashift:V1TI
> +       (lshiftrt:V1TI
>          (match_operand:VI_128 1 "register_operand")
>          (match_operand:SI 2 "const_0_to_255_mul_8_operand")))
>     (set (match_operand:VI_128 0 "register_operand") (match_dup 4))]
> @@ -10791,48 +10798,26 @@ (define_expand "vec_shl_<mode>"
>    operands[4] = gen_lowpart (<MODE>mode, operands[3]);
>  })
>
> -(define_insn "<sse2_avx2>_ashl<mode>3"
> -  [(set (match_operand:VIMAX_AVX2 0 "register_operand" "=x,v")
> -       (ashift:VIMAX_AVX2
> -        (match_operand:VIMAX_AVX2 1 "register_operand" "0,v")
> -        (match_operand:SI 2 "const_0_to_255_mul_8_operand" "n,n")))]
> -  "TARGET_SSE2"
> +(define_insn "avx512bw_<shift_insn><mode>3"
> +  [(set (match_operand:VIMAX_AVX512VL 0 "register_operand" "=v")
> +       (any_lshift:VIMAX_AVX512VL
> +        (match_operand:VIMAX_AVX512VL 1 "nonimmediate_operand" "vm")
> +        (match_operand:SI 2 "const_0_to_255_mul_8_operand" "n")))]
> +  "TARGET_AVX512BW"
>  {
>    operands[2] = GEN_INT (INTVAL (operands[2]) / 8);
> -
> -  switch (which_alternative)
> -    {
> -    case 0:
> -      return "pslldq\t{%2, %0|%0, %2}";
> -    case 1:
> -      return "vpslldq\t{%2, %1, %0|%0, %1, %2}";
> -    default:
> -      gcc_unreachable ();
> -    }
> +  return "vp<vshift>dq\t{%2, %1, %0|%0, %1, %2}";
>  }
> -  [(set_attr "isa" "noavx,avx")
> -   (set_attr "type" "sseishft")
> +  [(set_attr "type" "sseishft")
>     (set_attr "length_immediate" "1")
> -   (set_attr "prefix_data16" "1,*")
> -   (set_attr "prefix" "orig,vex")
> +   (set_attr "prefix" "maybe_evex")
>     (set_attr "mode" "<sseinsnmode>")])
>
> -(define_expand "vec_shr_<mode>"
> -  [(set (match_dup 3)
> -       (lshiftrt:V1TI
> -        (match_operand:VI_128 1 "register_operand")
> -        (match_operand:SI 2 "const_0_to_255_mul_8_operand")))
> -   (set (match_operand:VI_128 0 "register_operand") (match_dup 4))]
> -  "TARGET_SSE2"
> -{
> -  operands[1] = gen_lowpart (V1TImode, operands[1]);
> -  operands[3] = gen_reg_rtx (V1TImode);
> -  operands[4] = gen_lowpart (<MODE>mode, operands[3]);
> -})
> +(define_code_attr atom_shift_unit [(ashift "*") (lshiftrt "sishuf")])
>
> -(define_insn "<sse2_avx2>_lshr<mode>3"
> +(define_insn "<sse2_avx2>_<shift_insn><mode>3"
>    [(set (match_operand:VIMAX_AVX2 0 "register_operand" "=x,v")
> -       (lshiftrt:VIMAX_AVX2
> +       (any_lshift:VIMAX_AVX2
>          (match_operand:VIMAX_AVX2 1 "register_operand" "0,v")
>          (match_operand:SI 2 "const_0_to_255_mul_8_operand" "n,n")))]
>    "TARGET_SSE2"
> @@ -10842,9 +10827,9 @@ (define_insn "<sse2_avx2>_lshr<mode>3"
>    switch (which_alternative)
>      {
>      case 0:
> -      return "psrldq\t{%2, %0|%0, %2}";
> +      return "p<vshift>dq\t{%2, %0|%0, %2}";
>      case 1:
> -      return "vpsrldq\t{%2, %1, %0|%0, %1, %2}";
> +      return "vp<vshift>dq\t{%2, %1, %0|%0, %1, %2}";
>      default:
>        gcc_unreachable ();
>      }
> @@ -10852,7 +10837,7 @@ (define_insn "<sse2_avx2>_lshr<mode>3"
>    [(set_attr "isa" "noavx,avx")
>     (set_attr "type" "sseishft")
>     (set_attr "length_immediate" "1")
> -   (set_attr "atom_unit" "sishuf")
> +   (set_attr "atom_unit" "<atom_shift_unit>")
>     (set_attr "prefix_data16" "1,*")
>     (set_attr "prefix" "orig,vex")
>     (set_attr "mode" "<sseinsnmode>")])
> --- gcc/testsuite/gcc.target/i386/pr82370.c.jj  2017-10-24 16:22:16.665464886 
> +0200
> +++ gcc/testsuite/gcc.target/i386/pr82370.c     2017-10-24 16:22:16.665464886 
> +0200
> @@ -0,0 +1,18 @@
> +/* PR target/82370 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512vl -mavx512bw -masm=att" } */
> +/* { dg-final { scan-assembler-times "vpslldq\[ \t]\+\\\$5, 
> \\(%\[a-z0-9,]*\\), %xmm\[0-9]\+" 1 } } */
> +/* { dg-final { scan-assembler-times "vpsrldq\[ \t]\+\\\$5, 
> \\(%\[a-z0-9,]*\\), %xmm\[0-9]\+" 1 } } */
> +/* { dg-final { scan-assembler-times "vpslldq\[ \t]\+\\\$5, 
> \\(%\[a-z0-9,]*\\), %ymm\[0-9]\+" 1 } } */
> +/* { dg-final { scan-assembler-times "vpsrldq\[ \t]\+\\\$5, 
> \\(%\[a-z0-9,]*\\), %ymm\[0-9]\+" 1 } } */
> +/* { dg-final { scan-assembler-times "vpslldq\[ \t]\+\\\$5, 
> \\(%\[a-z0-9,]*\\), %zmm\[0-9]\+" 1 } } */
> +/* { dg-final { scan-assembler-times "vpsrldq\[ \t]\+\\\$5, 
> \\(%\[a-z0-9,]*\\), %zmm\[0-9]\+" 1 } } */
> +
> +#include <x86intrin.h>
> +
> +__m512i f1 (__m512i *x) { return _mm512_bslli_epi128 (*x, 5); }
> +__m512i f2 (__m512i *x) { return _mm512_bsrli_epi128 (*x, 5); }
> +__m256i f3 (__m256i *x) { return _mm256_bslli_epi128 (*x, 5); }
> +__m256i f4 (__m256i *x) { return _mm256_bsrli_epi128 (*x, 5); }
> +__m128i f5 (__m128i *x) { return _mm_bslli_si128 (*x, 5); }
> +__m128i f6 (__m128i *x) { return _mm_bsrli_si128 (*x, 5); }
>
>
>         Jakub

Reply via email to