https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98442
Bug ID: 98442 Summary: [X86] suboptimal for memset with CLEAR_BY_PIECES Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com CC: hjl.tools at gmail dot com, wei3.xiao at intel dot com, wwwhhhyyy333 at gmail dot com Target Milestone: --- Target: x86_64-*-* i?86-*-* cat test.c -------- char Tab[64]; void foo(int n) { for (int i= 0; i != 64; i++) Tab[i] = 0; } ---- gcc generate ------ foo(int): vpxor xmm0, xmm0, xmm0 vmovdqa XMMWORD PTR Tab[rip], xmm0 vmovdqa XMMWORD PTR Tab[rip+16], xmm0 vmovdqa XMMWORD PTR Tab[rip+32], xmm0 vmovdqa XMMWORD PTR Tab[rip+48], xmm0 ret Tab: .zero 64 --------- Could be better ---- foo(int): vpxor ymm0, ymm0, ymm0 #4.5 vmovdqu YMMWORD PTR Tab[rip], ymm0 #4.5 vmovdqu YMMWORD PTR 32+Tab[rip], ymm0 #4.5 vzeroupper #6.1 ret #6.1 Tab: ----- GCC use 128-bit as default ---- bool default_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT size, unsigned int alignment, enum by_pieces_operation op, bool speed_p) { unsigned int max_size = 0; unsigned int ratio = 0; switch (op) { case CLEAR_BY_PIECES: max_size = STORE_MAX_PIECES; ratio = CLEAR_RATIO (speed_p); ---- Define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P for i386?