On Thu, Aug 11, 2016 at 1:16 AM, Uros Bizjak <ubiz...@gmail.com> wrote: > On Wed, Aug 10, 2016 at 6:44 PM, H.J. Lu <hjl.to...@gmail.com> wrote: > >>>>> Use TImode for piecewise move in 64-bit mode. When vector register >>>>> is used for piecewise move, we don't increase stack_alignment_needed >>>>> since vector register spill isn't required for piecewise move. Since >>>>> stack_realign_needed is set to true by checking stack_alignment_estimated >>>>> set by pseudo vector register usage, we also need to check >>>>> stack_realign_needed to eliminate frame pointer. >>>> >>>> Why only in 64-bit mode? We can use SSE moves also in 32-bit mode. >>> >>> I will extend it to 32-bit mode. >> >> It doesn't work in 32-bit mode due to >> >> #define MAX_FIXED_MODE_SIZE GET_MODE_BITSIZE (TARGET_64BIT ? TImode : >> DImode): >> >> /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc >> -B/export/build/gnu/gcc/build-x86_64-linux/gcc/ -O2 >> -fno-asynchronous-unwind-tables -m32 -S -o x.s x.i >> x.i: In function ‘foo’: >> x.i:6:10: internal compiler error: in by_pieces_ninsns, at expr.c:799 >> return __builtin_mempcpy (dst, src, 32); >> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > This happens since by_pieces_ninsns determines widest mode by calling > widest_*INT*_mode_for_size, while moves can also use vector-mode > moves. This is an infrastructure problem, and will bite you on 64bit > targets when MOVE_MAX_PIECES returns OImode or XImode size.
I opened: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=74113 > +#define MOVE_MAX_PIECES \ > + ((TARGET_64BIT \ > + && TARGET_SSE2 \ > + && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \ > + && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) ? 16 : UNITS_PER_WORD) > > The above part is OK with an appropriate ??? comment, describing the > infrastructure limitation. Also, please use GET_MODE_SIZE (TImode) > instead of magic constant. > > Can you please submit the realignment patch as a separate follow-up > patch? Let's keep two issues separate. > > Uros. Here is the updated patch. OK for trunk? -- H.J.
From b3caa00a2164b44148552bb82bf616e0aeba5ea7 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" <hjl.to...@gmail.com> Date: Mon, 8 Aug 2016 09:08:01 -0700 Subject: [PATCH] Use TImode for piecewise move in 64-bit mode Use TImode for piecewise move in 64-bit mode. We should use TImode in 32-bit mode and use OImode or XImode if they are available. But since by_pieces_ninsns determines the widest mode with MAX_FIXED_MODE_SIZE, we can only use TImode in 64-bit mode. gcc/ * config/i386/i386.h (MOVE_MAX_PIECES): Use TImode in 64-bit mode if unaligned SSE load and store are optimal. gcc/testsuite/ * gcc.target/i386/pieces-memcpy-1.c: New test. * gcc.target/i386/pieces-memcpy-2.c: Likewise. * gcc.target/i386/pieces-memcpy-3.c: Likewise. * gcc.target/i386/pieces-memcpy-4.c: Likewise. * gcc.target/i386/pieces-memcpy-5.c: Likewise. * gcc.target/i386/pieces-memcpy-6.c: Likewise. --- gcc/config/i386/i386.h | 14 ++++++++++++-- gcc/testsuite/gcc.target/i386/pieces-memcpy-1.c | 17 +++++++++++++++++ gcc/testsuite/gcc.target/i386/pieces-memcpy-2.c | 17 +++++++++++++++++ gcc/testsuite/gcc.target/i386/pieces-memcpy-3.c | 17 +++++++++++++++++ gcc/testsuite/gcc.target/i386/pieces-memcpy-4.c | 17 +++++++++++++++++ gcc/testsuite/gcc.target/i386/pieces-memcpy-5.c | 17 +++++++++++++++++ gcc/testsuite/gcc.target/i386/pieces-memcpy-6.c | 17 +++++++++++++++++ 7 files changed, 114 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-4.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-5.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-6.c diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 9b66264..8751143 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -1950,8 +1950,18 @@ typedef struct ix86_args { /* MOVE_MAX_PIECES is the number of bytes at a time which we can move efficiently, as opposed to MOVE_MAX which is the maximum - number of bytes we can move with a single instruction. */ -#define MOVE_MAX_PIECES UNITS_PER_WORD + number of bytes we can move with a single instruction. + + ??? We should use TImode in 32-bit mode and use OImode or XImode + if they are available. But since by_pieces_ninsns determines the + widest mode with MAX_FIXED_MODE_SIZE, we can only use TImode in + 64-bit mode. */ +#define MOVE_MAX_PIECES \ + ((TARGET_64BIT \ + && TARGET_SSE2 \ + && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \ + && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \ + ? GET_MODE_SIZE (TImode) : UNITS_PER_WORD) /* If a memory-to-memory move would take MOVE_RATIO or more simple move-instruction pairs, we will do a movmem or libcall instead. diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-1.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-1.c new file mode 100644 index 0000000..adc0aa8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-1.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 64); +} + +/* { dg-final { scan-assembler-times "movdqu\[ \\t\]+\[^\n\]*%xmm" 4 } } */ +/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 4 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-2.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-2.c new file mode 100644 index 0000000..c52c1d9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-2.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 33); +} + +/* { dg-final { scan-assembler-times "movdqu\[ \\t\]+\[^\n\]*%xmm" 2 } } */ +/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 2 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-3.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-3.c new file mode 100644 index 0000000..c532bbd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-3.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 17); +} + +/* { dg-final { scan-assembler-times "movdqu\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-4.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-4.c new file mode 100644 index 0000000..4ef763d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-4.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune=generic" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 18); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "vmovups\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-5.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-5.c new file mode 100644 index 0000000..2687560 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-5.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mavx512f -mtune=generic" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 19); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "vmovups\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-6.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-6.c new file mode 100644 index 0000000..a205f83 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-6.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune=sandybridge" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 33); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 2 } } */ +/* { dg-final { scan-assembler-times "vmovups\[ \\t\]+\[^\n\]*%xmm" 2 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ -- 2.7.4