https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98442
Bug ID: 98442
Summary: [X86] suboptimal for memset with CLEAR_BY_PIECES
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: crazylht at gmail dot com
CC: hjl.tools at gmail dot com, wei3.xiao at intel dot com,
wwwhhhyyy333 at gmail dot com
Target Milestone: ---
Target: x86_64-*-* i?86-*-*
cat test.c
--------
char Tab[64];
void foo(int n)
{
for (int i= 0; i != 64; i++)
Tab[i] = 0;
}
----
gcc generate
------
foo(int):
vpxor xmm0, xmm0, xmm0
vmovdqa XMMWORD PTR Tab[rip], xmm0
vmovdqa XMMWORD PTR Tab[rip+16], xmm0
vmovdqa XMMWORD PTR Tab[rip+32], xmm0
vmovdqa XMMWORD PTR Tab[rip+48], xmm0
ret
Tab:
.zero 64
---------
Could be better
----
foo(int):
vpxor ymm0, ymm0, ymm0 #4.5
vmovdqu YMMWORD PTR Tab[rip], ymm0 #4.5
vmovdqu YMMWORD PTR 32+Tab[rip], ymm0 #4.5
vzeroupper #6.1
ret #6.1
Tab:
-----
GCC use 128-bit as default
----
bool
default_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT size,
unsigned int alignment,
enum by_pieces_operation op,
bool speed_p)
{
unsigned int max_size = 0;
unsigned int ratio = 0;
switch (op)
{
case CLEAR_BY_PIECES:
max_size = STORE_MAX_PIECES;
ratio = CLEAR_RATIO (speed_p);
----
Define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P for i386?