https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104704

--- Comment #8 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to H.J. Lu from comment #4)
> (In reply to Hongtao.liu from comment #3)
> > (In reply to H.J. Lu from comment #1)
> > > ix86_expand_vector_move shouldn't use ix86_gen_scratch_sse_rtx.
> > 
> > Is it problematic for TARGET_GEN_MEMSET_SCRATCH_RTX?
> 
> It is OK as long as it is used only by memset expander.

Use gen_reg_rtx for TARGET_GEN_MEMSET_SCRATCH_RTX regresses

gcc.target/i386/pieces-memset-21.c scan-assembler-not vzeroupper
gcc.target/i386/pieces-memset-3.c scan-assembler-not %[re]bp
gcc.target/i386/pieces-memset-3.c scan-assembler-not and[^\n\r]*%[re]sp
gcc.target/i386/pieces-memset-37.c scan-assembler-not %[re]bp
gcc.target/i386/pieces-memset-37.c scan-assembler-not and[^\n\r]*%[re]sp
gcc.target/i386/pieces-memset-39.c scan-assembler-not %[re]bp
gcc.target/i386/pieces-memset-39.c scan-assembler-not and[^\n\r]*%[re]sp
gcc.target/i386/pieces-memset-46.c scan-assembler-times vmovw[ \\t]+[^\n]*%xmm
1
gcc.target/i386/pieces-memset-47.c scan-assembler-times vmovw[ \\t]+[^\n]*%xmm
1
gcc.target/i386/pieces-memset-48.c scan-assembler-times vmovw[ \\t]+[^\n]*%xmm
1
gcc.target/i386/pr90773-14.c scan-assembler-times movd[\\t ]+%xmm[0-9]+,
16\\(%[^,]+\\) 1
gcc.target/i386/pr90773-17.c scan-assembler-times vmovd[\\t ]+%xmm[0-9]+,
15\\(%[^,]+\\) 1
gcc.target/i386/pr90773-5.c scan-assembler-times movq[\\t ]+%xmm[0-9]+,
13\\(%[^,]+\\) 1
unix/-m32: gcc.dg/guality/vla-1.c   -O2  -DPREVENT_OPTIMIZATION  line 24 i == 5
unix/-m32: gcc.dg/guality/vla-1.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  -DPREVENT_OPTIMIZATION line 24 i == 5
unix/-m32: gcc.dg/guality/vla-1.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 24 i == 5
unix/-m32: gcc.dg/guality/vla-1.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 24 sizeof (a) == 17 * sizeof
(short)
unix/-m32: gcc.dg/guality/vla-1.c   -O3 -g  -DPREVENT_OPTIMIZATION  line 24 i
== 5
unix/-m32: gcc.target/i386/pieces-memset-3.c scan-assembler-not
and[^\n\r]*%[re]sp
unix/-m32: gcc.target/i386/pieces-memset-37.c scan-assembler-not %[re]bp
unix/-m32: gcc.target/i386/pieces-memset-37.c scan-assembler-not
and[^\n\r]*%[re]sp
unix/-m32: gcc.target/i386/pieces-memset-39.c scan-assembler-not %[re]bp
unix/-m32: gcc.target/i386/pieces-memset-39.c scan-assembler-not
and[^\n\r]*%[re]sp
unix/-m32: gcc.target/i386/pieces-memset-46.c scan-assembler-times vmovw[
\\t]+[^\n]*%xmm 1
unix/-m32: gcc.target/i386/pieces-memset-47.c scan-assembler-times vmovw[
\\t]+[^\n]*%xmm 1
unix/-m32: gcc.target/i386/pieces-memset-48.c scan-assembler-times vmovw[
\\t]+[^\n]*%xmm 1
unix/-m32: gcc.target/i386/pr90773-14.c scan-assembler-times movd[\\t
]+%xmm[0-9]+, 16\\(%[^,]+\\) 1
unix/-m32: gcc.target/i386/pr90773-17.c scan-assembler-times vmovd[\\t
]+%xmm[0-9]+, 15\\(%[^,]+\\) 1

It can be grouped into 4 categories:

1) stack alignment is needed. 
2) vzeroupper is needed.
3) rtl optimization rematerial vmovd xmm to movl imm which seems to be more
optimal

       vpbroadcastb    %eax, %xmm31
-       vmovdqu8        %xmm31, (%rdx)
-       vmovd   %xmm31, 15(%rdx)
+       vpbroadcastb    %eax, %xmm0
+       vmovdqu8        %xmm0, (%rdx)
+       movl    $202116108, 15(%rdx)

4) Some debug info missing after optimziation(I think it's acceptable, though
we can try to pass down debug info, related testcase gcc.dg/guality/vla-1.c).

I think 1),2) are acceptable since it's the same as GCC11's behavior, 3) is
better than currect trunk, for 4), it's about debuggability, i'll try to handle
this.

Reply via email to