https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121970

            Bug ID: 121970
           Summary: struct copy still use zmm even specify -mmove-max=256
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: liuhongt at gcc dot gnu.org
  Target Milestone: ---

typedef struct
{
    double ds[8];
}ds;

extern void bar (ds* );

void
foo (double* a, double* b, double* c, double* d, ds* __restrict e, int n)
{
    ds tmp[2];

    for (int j = 0 ; j != 1024; j++)
      for (int i = 0; i != 8 ; i++)
      tmp[0].ds[i] += a[8*j + i] * b[8*j + i];

    for (int j = 0; j != 1024; j++)
      for (int i = 0; i != 8; i++)
        tmp[1].ds[i] += c[8*j + i] * c[8*j + i];


   bar (tmp);
   e[0] = tmp[0];
   e[3] = tmp[1];
}

with -Ofast  -march=sapphirerapids  -mmove-max=256

There's zmm used for struct copy

        call    bar(ds*)
        vmovdqu64       (%rsp), %zmm0
        vmovdqu64       %zmm0, (%rbx)
        vmovdqu64       64(%rsp), %zmm0
        vmovdqu64       %zmm0, 192(%rbx)
        vzeroupper
        movq    -8(%rbp), %rbx

with -Ofast  -march=sapphirerapids -mstore-max=256
Still see zmm.

need to add both options
-Ofast  -march=sapphirerapids -mstore-max=256  -mmove-max=256 then ymm is used
for struct copy.


https://godbolt.org/z/M739c9ejn

Not sure if it's on purpose or it's a bug.

Reply via email to