Re: [PATCH] X86: Add an option -muse-unaligned-vector-move

Richard Biener via Gcc-patches Wed, 20 Oct 2021 01:07:05 -0700

On Wed, Oct 20, 2021 at 9:48 AM Xu Dianhong <dianho...@gmail.com> wrote:
>
> Thanks for the comments.
>
> > And does it even work?
> It works, I checked it in the test case, and when using this option, it can 
> emit an unaligned vector move.
> >I fail to see adjustments to memory operands of
> SSE/AVX instructions that have to be aligned
> I changed all vector move in "get_ssemov" without checking the move with 
> memory operands or not.
> >and now would need to be
> pushed to separate unaligned moves with an extra register?
> I think it did not use an extra register. I'm not sure if I got your 
> question, and this patch just change the final operator of SSE MOVE from 
> aligned operator to unaligned operator, and I did not change the operands.


For example

typedef double v2df __attribute__((vector_size(16)));

v2df a, b;

void foo ()
{
  a += b;
}

will compile to

foo:
.LFB0:
        .cfi_startproc
        movapd  a(%rip), %xmm0
        addpd   b(%rip), %xmm0
        movaps  %xmm0, a(%rip)
        ret

what should -muse-unaligned-vector-move do here?  The addpd b(%rip), %xmm0
instruction implies an aligned move from b(%rip).

It looks your patch could be better implemented in the assembler, just using
the unaligned encodings for aligned moves?

Richard.

> On Wed, Oct 20, 2021 at 3:04 PM Richard Biener <richard.guent...@gmail.com> 
> wrote:
>>
>> On Wed, Oct 20, 2021 at 9:02 AM Richard Biener
>> <richard.guent...@gmail.com> wrote:
>> >
>> > On Wed, Oct 20, 2021 at 7:31 AM dianhong.xu--- via Gcc-patches
>> > <gcc-patches@gcc.gnu.org> wrote:
>> > >
>> > > From: dianhong xu <dianhong...@intel.com>
>> > >
>> > > Add -muse-unaligned-vector-move option to emit unaligned vector move
>> > > instaructions.
>> >
>> > Why would you ever want to have such option?!  Should the documentation
>> > at least read "emit unaligned vector moves even for aligned storage or when
>> > using aligned move intrinsics"?
>>
>> And does it even work?  I fail to see adjustments to memory operands of
>> SSE/AVX instructions that have to be aligned and now would need to be
>> pushed to separate unaligned moves with an extra register?
>>
>> Richard.
>>
>> >
>> > Richard.
>> >
>> > > gcc/ChangeLog:
>> > >
>> > >         * config/i386/i386-options.c (ix86_target_string): Add
>> > >         -muse-unaligned-vector-move.
>> > >         * config/i386/i386.c (ix86_get_ssemov): Emit unaligned vector if 
>> > > use
>> > >         the new option.
>> > >         * config/i386/i386.opt (muse-unaligned-vector-move): New.
>> > >         * config/i386/sse.md: Emit unaligned vector if use this new 
>> > > option
>> > >         * doc/invoke.texi: Document -muse-unaligned-vector-move
>> > >
>> > > gcc/testsuite/ChangeLog:
>> > >
>> > >         * gcc.target/i386/avx2-vector-unaligned-load-store-1.c: New test.
>> > >         * gcc.target/i386/avx2-vector-unaligned-load-store-2.c: New test.
>> > >         * gcc.target/i386/avx2-vector-unaligned-load-store-3.c: New test.
>> > >         * gcc.target/i386/avx512vl-vector-unaligned-load-store-1.c: New 
>> > > test.
>> > > ---
>> > >  gcc/config/i386/i386-options.c                |   3 +-
>> > >  gcc/config/i386/i386.c                        |  41 +++----
>> > >  gcc/config/i386/i386.opt                      |   4 +
>> > >  gcc/config/i386/sse.md                        |  30 +++--
>> > >  gcc/doc/invoke.texi                           |   7 ++
>> > >  .../i386/avx2-vector-unaligned-load-store-1.c | 102 +++++++++++++++++
>> > >  .../i386/avx2-vector-unaligned-load-store-2.c | 107 ++++++++++++++++++
>> > >  .../i386/avx2-vector-unaligned-load-store-3.c |  11 ++
>> > >  .../avx512vl-vector-unaligned-load-store-1.c  |  13 +++
>> > >  9 files changed, 287 insertions(+), 31 deletions(-)
>> > >  create mode 100644 
>> > > gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-1.c
>> > >  create mode 100644 
>> > > gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-2.c
>> > >  create mode 100644 
>> > > gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-3.c
>> > >  create mode 100644 
>> > > gcc/testsuite/gcc.target/i386/avx512vl-vector-unaligned-load-store-1.c
>> > >
>> > > diff --git a/gcc/config/i386/i386-options.c 
>> > > b/gcc/config/i386/i386-options.c
>> > > index c9523b26f49..eacbd0f5451 100644
>> > > --- a/gcc/config/i386/i386-options.c
>> > > +++ b/gcc/config/i386/i386-options.c
>> > > @@ -397,7 +397,8 @@ ix86_target_string (HOST_WIDE_INT isa, HOST_WIDE_INT 
>> > > isa2,
>> > >      { "-mstv",                         MASK_STV },
>> > >      { "-mavx256-split-unaligned-load", MASK_AVX256_SPLIT_UNALIGNED_LOAD 
>> > > },
>> > >      { "-mavx256-split-unaligned-store",        
>> > > MASK_AVX256_SPLIT_UNALIGNED_STORE },
>> > > -    { "-mcall-ms2sysv-xlogues",                
>> > > MASK_CALL_MS2SYSV_XLOGUES }
>> > > +    { "-mcall-ms2sysv-xlogues",                
>> > > MASK_CALL_MS2SYSV_XLOGUES },
>> > > +    { "-muse-unaligned-vector-move",   MASK_USE_UNALIGNED_VECTOR_MOVE }
>> > >    };
>> > >
>> > >    /* Additional flag options.  */
>> > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> > > index f111411e599..7581e854021 100644
>> > > --- a/gcc/config/i386/i386.c
>> > > +++ b/gcc/config/i386/i386.c
>> > > @@ -5323,8 +5323,9 @@ ix86_get_ssemov (rtx *operands, unsigned size,
>> > >                  enum attr_mode insn_mode, machine_mode mode)
>> > >  {
>> > >    char buf[128];
>> > > -  bool misaligned_p = (misaligned_operand (operands[0], mode)
>> > > -                      || misaligned_operand (operands[1], mode));
>> > > +  bool need_unaligned_p = (TARGET_USE_UNALIGNED_VECTOR_MOVE
>> > > +                          || misaligned_operand (operands[0], mode)
>> > > +                          || misaligned_operand (operands[1], mode));
>> > >    bool evex_reg_p = (size == 64
>> > >                      || EXT_REX_SSE_REG_P (operands[0])
>> > >                      || EXT_REX_SSE_REG_P (operands[1]));
>> > > @@ -5380,17 +5381,17 @@ ix86_get_ssemov (rtx *operands, unsigned size,
>> > >         {
>> > >         case opcode_int:
>> > >           if (scalar_mode == E_HFmode)
>> > > -           opcode = (misaligned_p
>> > > +           opcode = (need_unaligned_p
>> > >                       ? (TARGET_AVX512BW ? "vmovdqu16" : "vmovdqu64")
>> > >                       : "vmovdqa64");
>> > >           else
>> > > -           opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
>> > > +           opcode = need_unaligned_p ? "vmovdqu32" : "vmovdqa32";
>> > >           break;
>> > >         case opcode_float:
>> > > -         opcode = misaligned_p ? "vmovups" : "vmovaps";
>> > > +         opcode = need_unaligned_p ? "vmovups" : "vmovaps";
>> > >           break;
>> > >         case opcode_double:
>> > > -         opcode = misaligned_p ? "vmovupd" : "vmovapd";
>> > > +         opcode = need_unaligned_p ? "vmovupd" : "vmovapd";
>> > >           break;
>> > >         }
>> > >      }
>> > > @@ -5399,21 +5400,21 @@ ix86_get_ssemov (rtx *operands, unsigned size,
>> > >        switch (scalar_mode)
>> > >         {
>> > >         case E_HFmode:
>> > > -         opcode = (misaligned_p
>> > > +         opcode = (need_unaligned_p
>> > >                     ? (TARGET_AVX512BW ? "vmovdqu16" : "vmovdqu64")
>> > >                     : "vmovdqa64");
>> > >           break;
>> > >         case E_SFmode:
>> > > -         opcode = misaligned_p ? "%vmovups" : "%vmovaps";
>> > > +         opcode = need_unaligned_p ? "%vmovups" : "%vmovaps";
>> > >           break;
>> > >         case E_DFmode:
>> > > -         opcode = misaligned_p ? "%vmovupd" : "%vmovapd";
>> > > +         opcode = need_unaligned_p ? "%vmovupd" : "%vmovapd";
>> > >           break;
>> > >         case E_TFmode:
>> > >           if (evex_reg_p)
>> > > -           opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
>> > > +           opcode = need_unaligned_p ? "vmovdqu64" : "vmovdqa64";
>> > >           else
>> > > -           opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
>> > > +           opcode = need_unaligned_p ? "%vmovdqu" : "%vmovdqa";
>> > >           break;
>> > >         default:
>> > >           gcc_unreachable ();
>> > > @@ -5425,13 +5426,13 @@ ix86_get_ssemov (rtx *operands, unsigned size,
>> > >         {
>> > >         case E_QImode:
>> > >           if (evex_reg_p)
>> > > -           opcode = (misaligned_p
>> > > +           opcode = (need_unaligned_p
>> > >                       ? (TARGET_AVX512BW
>> > >                          ? "vmovdqu8"
>> > >                          : "vmovdqu64")
>> > >                       : "vmovdqa64");
>> > >           else
>> > > -           opcode = (misaligned_p
>> > > +           opcode = (need_unaligned_p
>> > >                       ? (TARGET_AVX512BW
>> > >                          ? "vmovdqu8"
>> > >                          : "%vmovdqu")
>> > > @@ -5439,13 +5440,13 @@ ix86_get_ssemov (rtx *operands, unsigned size,
>> > >           break;
>> > >         case E_HImode:
>> > >           if (evex_reg_p)
>> > > -           opcode = (misaligned_p
>> > > +           opcode = (need_unaligned_p
>> > >                       ? (TARGET_AVX512BW
>> > >                          ? "vmovdqu16"
>> > >                          : "vmovdqu64")
>> > >                       : "vmovdqa64");
>> > >           else
>> > > -           opcode = (misaligned_p
>> > > +           opcode = (need_unaligned_p
>> > >                       ? (TARGET_AVX512BW
>> > >                          ? "vmovdqu16"
>> > >                          : "%vmovdqu")
>> > > @@ -5453,20 +5454,20 @@ ix86_get_ssemov (rtx *operands, unsigned size,
>> > >           break;
>> > >         case E_SImode:
>> > >           if (evex_reg_p)
>> > > -           opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
>> > > +           opcode = need_unaligned_p ? "vmovdqu32" : "vmovdqa32";
>> > >           else
>> > > -           opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
>> > > +           opcode = need_unaligned_p ? "%vmovdqu" : "%vmovdqa";
>> > >           break;
>> > >         case E_DImode:
>> > >         case E_TImode:
>> > >         case E_OImode:
>> > >           if (evex_reg_p)
>> > > -           opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
>> > > +           opcode = need_unaligned_p ? "vmovdqu64" : "vmovdqa64";
>> > >           else
>> > > -           opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
>> > > +           opcode = need_unaligned_p ? "%vmovdqu" : "%vmovdqa";
>> > >           break;
>> > >         case E_XImode:
>> > > -         opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
>> > > +         opcode = need_unaligned_p ? "vmovdqu64" : "vmovdqa64";
>> > >           break;
>> > >         default:
>> > >           gcc_unreachable ();
>> > > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
>> > > index ad366974b5b..2162d10925a 100644
>> > > --- a/gcc/config/i386/i386.opt
>> > > +++ b/gcc/config/i386/i386.opt
>> > > @@ -1170,3 +1170,7 @@ Support MWAIT and MONITOR built-in functions and 
>> > > code generation.
>> > >  mavx512fp16
>> > >  Target Mask(ISA2_AVX512FP16) Var(ix86_isa_flags2) Save
>> > >  Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX512F 
>> > > and AVX512FP16 built-in functions and code generation.
>> > > +
>> > > +muse-unaligned-vector-move
>> > > +Target Mask(USE_UNALIGNED_VECTOR_MOVE) Save
>> > > +Emit unaligned vector move instructions.
>> > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
>> > > index fbf056bf9e6..dc99597f195 100644
>> > > --- a/gcc/config/i386/sse.md
>> > > +++ b/gcc/config/i386/sse.md
>> > > @@ -17059,24 +17059,28 @@
>> > >    switch (<MODE>mode)
>> > >      {
>> > >      case E_V8DFmode:
>> > > -      if (misaligned_operand (operands[2], <ssequartermode>mode))
>> > > +      if (TARGET_USE_UNALIGNED_VECTOR_MOVE
>> > > +         || misaligned_operand (operands[2], <ssequartermode>mode))
>> > >         return "vmovupd\t{%2, %x0|%x0, %2}";
>> > >        else
>> > >         return "vmovapd\t{%2, %x0|%x0, %2}";
>> > >      case E_V16SFmode:
>> > > -      if (misaligned_operand (operands[2], <ssequartermode>mode))
>> > > +      if (TARGET_USE_UNALIGNED_VECTOR_MOVE
>> > > +         || misaligned_operand (operands[2], <ssequartermode>mode))
>> > >         return "vmovups\t{%2, %x0|%x0, %2}";
>> > >        else
>> > >         return "vmovaps\t{%2, %x0|%x0, %2}";
>> > >      case E_V8DImode:
>> > > -      if (misaligned_operand (operands[2], <ssequartermode>mode))
>> > > +      if (TARGET_USE_UNALIGNED_VECTOR_MOVE
>> > > +         || misaligned_operand (operands[2], <ssequartermode>mode))
>> > >         return which_alternative == 2 ? "vmovdqu64\t{%2, %x0|%x0, %2}"
>> > >                                       : "vmovdqu\t{%2, %x0|%x0, %2}";
>> > >        else
>> > >         return which_alternative == 2 ? "vmovdqa64\t{%2, %x0|%x0, %2}"
>> > >                                       : "vmovdqa\t{%2, %x0|%x0, %2}";
>> > >      case E_V16SImode:
>> > > -      if (misaligned_operand (operands[2], <ssequartermode>mode))
>> > > +      if (TARGET_USE_UNALIGNED_VECTOR_MOVE
>> > > +         || misaligned_operand (operands[2], <ssequartermode>mode))
>> > >         return which_alternative == 2 ? "vmovdqu32\t{%2, %x0|%x0, %2}"
>> > >                                       : "vmovdqu\t{%2, %x0|%x0, %2}";
>> > >        else
>> > > @@ -25238,27 +25242,32 @@
>> > >        switch (get_attr_mode (insn))
>> > >         {
>> > >         case MODE_V16SF:
>> > > -         if (misaligned_operand (operands[1], <ssehalfvecmode>mode))
>> > > +         if (TARGET_USE_UNALIGNED_VECTOR_MOVE
>> > > +             || misaligned_operand (operands[1], <ssehalfvecmode>mode))
>> > >             return "vmovups\t{%1, %t0|%t0, %1}";
>> > >           else
>> > >             return "vmovaps\t{%1, %t0|%t0, %1}";
>> > >         case MODE_V8DF:
>> > > -         if (misaligned_operand (operands[1], <ssehalfvecmode>mode))
>> > > +         if (TARGET_USE_UNALIGNED_VECTOR_MOVE
>> > > +             || misaligned_operand (operands[1], <ssehalfvecmode>mode))
>> > >             return "vmovupd\t{%1, %t0|%t0, %1}";
>> > >           else
>> > >             return "vmovapd\t{%1, %t0|%t0, %1}";
>> > >         case MODE_V8SF:
>> > > -         if (misaligned_operand (operands[1], <ssehalfvecmode>mode))
>> > > +         if (TARGET_USE_UNALIGNED_VECTOR_MOVE
>> > > +             || misaligned_operand (operands[1], <ssehalfvecmode>mode))
>> > >             return "vmovups\t{%1, %x0|%x0, %1}";
>> > >           else
>> > >             return "vmovaps\t{%1, %x0|%x0, %1}";
>> > >         case MODE_V4DF:
>> > > -         if (misaligned_operand (operands[1], <ssehalfvecmode>mode))
>> > > +         if (TARGET_USE_UNALIGNED_VECTOR_MOVE
>> > > +             || misaligned_operand (operands[1], <ssehalfvecmode>mode))
>> > >             return "vmovupd\t{%1, %x0|%x0, %1}";
>> > >           else
>> > >             return "vmovapd\t{%1, %x0|%x0, %1}";
>> > >         case MODE_XI:
>> > > -         if (misaligned_operand (operands[1], <ssehalfvecmode>mode))
>> > > +         if (TARGET_USE_UNALIGNED_VECTOR_MOVE
>> > > +             || misaligned_operand (operands[1], <ssehalfvecmode>mode))
>> > >             {
>> > >               if (which_alternative == 2)
>> > >                 return "vmovdqu\t{%1, %t0|%t0, %1}";
>> > > @@ -25277,7 +25286,8 @@
>> > >                 return "vmovdqa32\t{%1, %t0|%t0, %1}";
>> > >             }
>> > >         case MODE_OI:
>> > > -         if (misaligned_operand (operands[1], <ssehalfvecmode>mode))
>> > > +         if (TARGET_USE_UNALIGNED_VECTOR_MOVE
>> > > +             || misaligned_operand (operands[1], <ssehalfvecmode>mode))
>> > >             {
>> > >               if (which_alternative == 2)
>> > >                 return "vmovdqu\t{%1, %x0|%x0, %1}";
>> > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> > > index 0cc8a8edd05..13777d62437 100644
>> > > --- a/gcc/doc/invoke.texi
>> > > +++ b/gcc/doc/invoke.texi
>> > > @@ -1418,6 +1418,7 @@ See RS/6000 and PowerPC Options.
>> > >  -mstack-protector-guard-offset=@var{offset} @gol
>> > >  -mstack-protector-guard-symbol=@var{symbol} @gol
>> > >  -mgeneral-regs-only  -mcall-ms2sysv-xlogues @gol
>> > > +-muse-unaligned-vector-move @gol
>> > >  -mindirect-branch=@var{choice}  -mfunction-return=@var{choice} @gol
>> > >  -mindirect-branch-register -mneeded}
>> > >
>> > > @@ -31808,6 +31809,12 @@ resulting in fairly lengthy prologues and 
>> > > epilogues.  Using
>> > >  use stubs in the static portion of libgcc to perform these saves and 
>> > > restores,
>> > >  thus reducing function size at the cost of a few extra instructions.
>> > >
>> > > +@item -muse-unaligned-vector-move
>> > > +@opindex muse-unaligned-vector-move
>> > > +@opindex mno-use-unaligned-vector-move
>> > > +Use @option{-muse-unaligned-vector-move} to emits unaligned vector move
>> > > +instructions like vmovdqu, vmovups, vmovupd.
>> > > +
>> > >  @item -mtls-dialect=@var{type}
>> > >  @opindex mtls-dialect
>> > >  Generate code to access thread-local storage using the @samp{gnu} or
>> > > diff --git 
>> > > a/gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-1.c 
>> > > b/gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-1.c
>> > > new file mode 100644
>> > > index 00000000000..d21eee562ac
>> > > --- /dev/null
>> > > +++ b/gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-1.c
>> > > @@ -0,0 +1,102 @@
>> > > +/* { dg-do compile } */
>> > > +/* { dg-options "-O3 -mavx2 -muse-unaligned-vector-move" } */
>> > > +
>> > > +#define N 1024
>> > > +
>> > > +char **cp;
>> > > +char **ep;
>> > > +char **fp;
>> > > +
>> > > +void
>> > > +test_char ()
>> > > +{
>> > > +  int i;
>> > > +  char **ap = __builtin_assume_aligned (ep, 32);
>> > > +  char **zp;
>> > > +  for (i = 128; i > 0; i--)
>> > > +  {
>> > > +    *ap++ = *cp++;
>> > > +    *zp++ = *fp++;
>> > > +  }
>> > > +}
>> > > +
>> > > +float f1[N], f2[N], f3[N];
>> > > +
>> > > +void
>> > > +test_float (void)
>> > > +{
>> > > +  for (int i = 0; i < N; i++)
>> > > +  {
>> > > +    f3[i] = f1[i] * f2[i];
>> > > +  }
>> > > +}
>> > > +
>> > > +double d1[N], d2[N], d3[N];
>> > > +
>> > > +void
>> > > +test_double_load (void)
>> > > +{
>> > > +  for (int i = 0; i < N; i++)
>> > > +  {
>> > > +    d3[i] = d1[i] * d2[i];
>> > > +
>> > > +  }
>> > > +}
>> > > +
>> > > +unsigned char uc1[N], uc2[N], uc3[N];
>> > > +void
>> > > +test_unchar ()
>> > > +{
>> > > +   for (int i=0;i<N;i++) {
>> > > +     uc3[i] = uc1[i] * uc2[i];
>> > > +   }
>> > > +}
>> > > +
>> > > +short st1[N], st2[N], st3[N];
>> > > +void
>> > > +test_short ()
>> > > +{
>> > > +   for (int i=0;i<N;i++) {
>> > > +     st3[i] = st1[i] * st2[i];
>> > > +   }
>> > > +}
>> > > +
>> > > +int n1[N], n2[N], n3[N];
>> > > +void
>> > > +test_int ()
>> > > +{
>> > > +   for (int i=0;i<N;i++) {
>> > > +     n3[i] = n1[i] * n2[i];
>> > > +   }
>> > > +}
>> > > +
>> > > +long l1[N], l2[N], l3[N];
>> > > +
>> > > +void
>> > > +test_long ()
>> > > +{
>> > > +  for (int i=0; i<N; i++)
>> > > +  {
>> > > +    l3[i] = l1[i] *l2[i];
>> > > +  }
>> > > +}
>> > > +
>> > > +long long ll1[N], ll2[N], ll3[N];
>> > > +
>> > > +void
>> > > +test_long_long()
>> > > +{
>> > > +  for (int i=0;i<N;i++)
>> > > +  {
>> > > +    ll3[i] = ll1[i]*ll2[i];
>> > > +  }
>> > > +}
>> > > +
>> > > +/* { dg-final { scan-assembler-not "vmovdqa" } } */
>> > > +/* { dg-final { scan-assembler-not "vmovaps" } } */
>> > > +/* { dg-final { scan-assembler-not "vmovapd" } } */
>> > > +/* { dg-final { scan-assembler-times "vmovdqu" 19 { target lp64 } } } */
>> > > +/* { dg-final { scan-assembler-times "vmovdqu" 46 { target x32 } } } */
>> > > +/* { dg-final { scan-assembler-times "vmovdqu" 47 { target ia32 } } } */
>> > > +/* { dg-final { scan-assembler-times "vmovups" 2 } } */
>> > > +/* { dg-final { scan-assembler-times "vmovupd" 2 } } */
>> > > diff --git 
>> > > a/gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-2.c 
>> > > b/gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-2.c
>> > > new file mode 100644
>> > > index 00000000000..65c81105ebd
>> > > --- /dev/null
>> > > +++ b/gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-2.c
>> > > @@ -0,0 +1,107 @@
>> > > +/* { dg-do compile } */
>> > > +/* { dg-options "-O3 -mavx2 -muse-unaligned-vector-move" } */
>> > > +
>> > > +#include <immintrin.h>
>> > > +__m128 value128;
>> > > +char src128[16];
>> > > +
>> > > +__m256 value256;
>> > > +float src256[8];
>> > > +
>> > > +void add128(__m128* pointer) {
>> > > +    value128 = _mm_add_ps(value128, *pointer);
>> > > +}
>> > > +
>> > > +void add256(__m256* pointer) {
>> > > +    value256 = _mm256_add_ps(value256, *pointer);
>> > > +}
>> > > +
>> > > +__m128d value128d;
>> > > +__m128d aux128d;
>> > > +float src128f[4];
>> > > +float res128f[4];
>> > > +double src128d[2];
>> > > +double res128d[2];
>> > > +
>> > > +void add128d(__m128d* pointer, __m128d aux, __m128d* res128d) {
>> > > +    value128d = _mm_add_pd(value128d, *pointer);
>> > > +    __m128d s1 = _mm_add_pd(aux, *pointer);
>> > > +    *res128d = _mm_add_pd(s1, value128d);
>> > > +}
>> > > +
>> > > +__m256d value256d;
>> > > +__m256d aux256d;
>> > > +float src256f[8];
>> > > +float res256f[8];
>> > > +double src256d[4];
>> > > +double res256d[4];
>> > > +
>> > > +void add256d(__m256d* pointer, __m256d aux, __m256d* res) {
>> > > +    value256d = _mm256_add_pd(value256d, *pointer);
>> > > +    __m256d s1 = _mm256_add_pd(aux, *pointer);
>> > > +    *res = _mm256_add_pd(s1, value256d);
>> > > +}
>> > > +
>> > > +__m256i value256i;
>> > > +__m256i aux256i;
>> > > +char src256c[32];
>> > > +char res256c[32];
>> > > +short src256s[16];
>> > > +short res256s[16];
>> > > +int src256i[8];
>> > > +int res256i[8];
>> > > +long long src256l[4];
>> > > +long long res256l[4];
>> > > +
>> > > +void add256i(__m256i* pointer, __m256i aux, __m256i* res) {
>> > > +    value256i = _mm256_add_epi32(value256i, *pointer);
>> > > +    __m256i s1 = _mm256_add_epi32(aux, *pointer);
>> > > +    *res = _mm256_add_epi32(s1, value256i);
>> > > +}
>> > > +
>> > > +void foo1() {
>> > > +    add128((__m128*)src128);
>> > > +}
>> > > +
>> > > +void foo2() {
>> > > +    add256((__m256*)src256);
>> > > +}
>> > > +
>> > > +void foo3() {
>> > > +    add128d((__m128d*)src128d, aux128d, (__m128d*)res128d);
>> > > +}
>> > > +
>> > > +void foo4() {
>> > > +    add128d((__m128d*)src128f, aux128d, (__m128d*)res128f);
>> > > +}
>> > > +
>> > > +void foo5() {
>> > > +    add256d((__m256d*)src256f, aux256d, (__m256d*)res256f);
>> > > +}
>> > > +
>> > > +void foo6() {
>> > > +    add256d((__m256d*)src256d, aux256d, (__m256d*)res256d);
>> > > +}
>> > > +
>> > > +void foo7() {
>> > > +    add256i((__m256i*)src256c, aux256i, (__m256i*)res256c);
>> > > +}
>> > > +
>> > > +void foo8() {
>> > > +    add256i((__m256i*)src256s, aux256i, (__m256i*)res256s);
>> > > +}
>> > > +
>> > > +void foo9() {
>> > > +    add256i((__m256i*)src256i, aux256i, (__m256i*)res256i);
>> > > +}
>> > > +
>> > > +void foo11() {
>> > > +    add256i((__m256i*)src256l, aux256i, (__m256i*)res256l);
>> > > +}
>> > > +
>> > > +/* { dg-final { scan-assembler-not "vmovaps" } } */
>> > > +/* { dg-final { scan-assembler-not "vmovapd" } } */
>> > > +/* { dg-final { scan-assembler-not "vmovdqa" } } */
>> > > +/* { dg-final { scan-assembler "vmovups" } } */
>> > > +/* { dg-final { scan-assembler "vmovupd" } } */
>> > > +/* { dg-final { scan-assembler "vmovdqu" } } */
>> > > diff --git 
>> > > a/gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-3.c 
>> > > b/gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-3.c
>> > > new file mode 100644
>> > > index 00000000000..59924304bae
>> > > --- /dev/null
>> > > +++ b/gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-3.c
>> > > @@ -0,0 +1,11 @@
>> > > +/* { dg-do compile } */
>> > > +/* { dg-options "-O3 -mavx2 -mno-use-unaligned-vector-move" } */
>> > > +
>> > > +#include "avx2-vector-unaligned-load-store-2.c"
>> > > +
>> > > +/* { dg-final { scan-assembler-not "vmovups" { target { ! ia32 } } } } 
>> > > */
>> > > +/* { dg-final { scan-assembler-not "vmovupd" { target { ! ia32 } } } } 
>> > > */
>> > > +/* { dg-final { scan-assembler-not "vmovdqu" } } */
>> > > +/* { dg-final { scan-assembler "vmovaps" } } */
>> > > +/* { dg-final { scan-assembler "vmovapd" } } */
>> > > +/* { dg-final { scan-assembler "vmovdqa" } } */
>> > > diff --git 
>> > > a/gcc/testsuite/gcc.target/i386/avx512vl-vector-unaligned-load-store-1.c 
>> > > b/gcc/testsuite/gcc.target/i386/avx512vl-vector-unaligned-load-store-1.c
>> > > new file mode 100644
>> > > index 00000000000..3759fd9f2f4
>> > > --- /dev/null
>> > > +++ 
>> > > b/gcc/testsuite/gcc.target/i386/avx512vl-vector-unaligned-load-store-1.c
>> > > @@ -0,0 +1,13 @@
>> > > +/* { dg-do compile } */
>> > > +/* { dg-options "-O3 -mavx512vl -muse-unaligned-vector-move" } */
>> > > +
>> > > +#include "avx2-vector-unaligned-load-store-1.c"
>> > > +
>> > > +/* { dg-final { scan-assembler-not "vmovdqa32" } } */
>> > > +/* { dg-final { scan-assembler-not "vmovdqa64" } } */
>> > > +/* { dg-final { scan-assembler-not "vmovaps" } } */
>> > > +/* { dg-final { scan-assembler-not "vmovapd" } } */
>> > > +/* { dg-final { scan-assembler "vmovdqu32" } } */
>> > > +/* { dg-final { scan-assembler "vmovdqu64" } } */
>> > > +/* { dg-final { scan-assembler "vmovups" } } */
>> > > +/* { dg-final { scan-assembler "vmovupd" } } */
>> > > --
>> > > 2.18.1
>> > >

Re: [PATCH] X86: Add an option -muse-unaligned-vector-move

Reply via email to