https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104704
--- Comment #8 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to H.J. Lu from comment #4) > (In reply to Hongtao.liu from comment #3) > > (In reply to H.J. Lu from comment #1) > > > ix86_expand_vector_move shouldn't use ix86_gen_scratch_sse_rtx. > > > > Is it problematic for TARGET_GEN_MEMSET_SCRATCH_RTX? > > It is OK as long as it is used only by memset expander. Use gen_reg_rtx for TARGET_GEN_MEMSET_SCRATCH_RTX regresses gcc.target/i386/pieces-memset-21.c scan-assembler-not vzeroupper gcc.target/i386/pieces-memset-3.c scan-assembler-not %[re]bp gcc.target/i386/pieces-memset-3.c scan-assembler-not and[^\n\r]*%[re]sp gcc.target/i386/pieces-memset-37.c scan-assembler-not %[re]bp gcc.target/i386/pieces-memset-37.c scan-assembler-not and[^\n\r]*%[re]sp gcc.target/i386/pieces-memset-39.c scan-assembler-not %[re]bp gcc.target/i386/pieces-memset-39.c scan-assembler-not and[^\n\r]*%[re]sp gcc.target/i386/pieces-memset-46.c scan-assembler-times vmovw[ \\t]+[^\n]*%xmm 1 gcc.target/i386/pieces-memset-47.c scan-assembler-times vmovw[ \\t]+[^\n]*%xmm 1 gcc.target/i386/pieces-memset-48.c scan-assembler-times vmovw[ \\t]+[^\n]*%xmm 1 gcc.target/i386/pr90773-14.c scan-assembler-times movd[\\t ]+%xmm[0-9]+, 16\\(%[^,]+\\) 1 gcc.target/i386/pr90773-17.c scan-assembler-times vmovd[\\t ]+%xmm[0-9]+, 15\\(%[^,]+\\) 1 gcc.target/i386/pr90773-5.c scan-assembler-times movq[\\t ]+%xmm[0-9]+, 13\\(%[^,]+\\) 1 unix/-m32: gcc.dg/guality/vla-1.c -O2 -DPREVENT_OPTIMIZATION line 24 i == 5 unix/-m32: gcc.dg/guality/vla-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none -DPREVENT_OPTIMIZATION line 24 i == 5 unix/-m32: gcc.dg/guality/vla-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -DPREVENT_OPTIMIZATION line 24 i == 5 unix/-m32: gcc.dg/guality/vla-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -DPREVENT_OPTIMIZATION line 24 sizeof (a) == 17 * sizeof (short) unix/-m32: gcc.dg/guality/vla-1.c -O3 -g -DPREVENT_OPTIMIZATION line 24 i == 5 unix/-m32: gcc.target/i386/pieces-memset-3.c scan-assembler-not and[^\n\r]*%[re]sp unix/-m32: gcc.target/i386/pieces-memset-37.c scan-assembler-not %[re]bp unix/-m32: gcc.target/i386/pieces-memset-37.c scan-assembler-not and[^\n\r]*%[re]sp unix/-m32: gcc.target/i386/pieces-memset-39.c scan-assembler-not %[re]bp unix/-m32: gcc.target/i386/pieces-memset-39.c scan-assembler-not and[^\n\r]*%[re]sp unix/-m32: gcc.target/i386/pieces-memset-46.c scan-assembler-times vmovw[ \\t]+[^\n]*%xmm 1 unix/-m32: gcc.target/i386/pieces-memset-47.c scan-assembler-times vmovw[ \\t]+[^\n]*%xmm 1 unix/-m32: gcc.target/i386/pieces-memset-48.c scan-assembler-times vmovw[ \\t]+[^\n]*%xmm 1 unix/-m32: gcc.target/i386/pr90773-14.c scan-assembler-times movd[\\t ]+%xmm[0-9]+, 16\\(%[^,]+\\) 1 unix/-m32: gcc.target/i386/pr90773-17.c scan-assembler-times vmovd[\\t ]+%xmm[0-9]+, 15\\(%[^,]+\\) 1 It can be grouped into 4 categories: 1) stack alignment is needed. 2) vzeroupper is needed. 3) rtl optimization rematerial vmovd xmm to movl imm which seems to be more optimal vpbroadcastb %eax, %xmm31 - vmovdqu8 %xmm31, (%rdx) - vmovd %xmm31, 15(%rdx) + vpbroadcastb %eax, %xmm0 + vmovdqu8 %xmm0, (%rdx) + movl $202116108, 15(%rdx) 4) Some debug info missing after optimziation(I think it's acceptable, though we can try to pass down debug info, related testcase gcc.dg/guality/vla-1.c). I think 1),2) are acceptable since it's the same as GCC11's behavior, 3) is better than currect trunk, for 4), it's about debuggability, i'll try to handle this.