https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98442

            Bug ID: 98442
           Summary: [X86] suboptimal for memset with CLEAR_BY_PIECES
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: crazylht at gmail dot com
                CC: hjl.tools at gmail dot com, wei3.xiao at intel dot com,
                    wwwhhhyyy333 at gmail dot com
  Target Milestone: ---
            Target: x86_64-*-* i?86-*-*

cat test.c

--------
char Tab[64];
void foo(int n)
{
    for (int i= 0; i != 64; i++)
     Tab[i] = 0;
}
----


gcc generate

------
foo(int):
  vpxor xmm0, xmm0, xmm0
  vmovdqa XMMWORD PTR Tab[rip], xmm0
  vmovdqa XMMWORD PTR Tab[rip+16], xmm0
  vmovdqa XMMWORD PTR Tab[rip+32], xmm0
  vmovdqa XMMWORD PTR Tab[rip+48], xmm0
  ret
Tab:
  .zero 64
---------

Could be better

----
foo(int):
        vpxor     ymm0, ymm0, ymm0                              #4.5
        vmovdqu   YMMWORD PTR Tab[rip], ymm0                    #4.5
        vmovdqu   YMMWORD PTR 32+Tab[rip], ymm0                 #4.5
        vzeroupper                                              #6.1
        ret                                                     #6.1
Tab:
-----

GCC use 128-bit as default
----
bool
default_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT size,
                                        unsigned int alignment,
                                        enum by_pieces_operation op,
                                        bool speed_p)
{
  unsigned int max_size = 0;
  unsigned int ratio = 0;

  switch (op)
    {
    case CLEAR_BY_PIECES:
      max_size = STORE_MAX_PIECES;
      ratio = CLEAR_RATIO (speed_p);
----

Define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P for i386?

Reply via email to