On Thu, Aug 11, 2016 at 8:41 AM, Uros Bizjak <ubiz...@gmail.com> wrote: > On Thu, Aug 11, 2016 at 5:26 PM, H.J. Lu <hjl.to...@gmail.com> wrote: >> On Thu, Aug 11, 2016 at 1:16 AM, Uros Bizjak <ubiz...@gmail.com> wrote: >>> On Wed, Aug 10, 2016 at 6:44 PM, H.J. Lu <hjl.to...@gmail.com> wrote: >>> >>>>>>> Use TImode for piecewise move in 64-bit mode. When vector register >>>>>>> is used for piecewise move, we don't increase stack_alignment_needed >>>>>>> since vector register spill isn't required for piecewise move. Since >>>>>>> stack_realign_needed is set to true by checking >>>>>>> stack_alignment_estimated >>>>>>> set by pseudo vector register usage, we also need to check >>>>>>> stack_realign_needed to eliminate frame pointer. >>>>>> >>>>>> Why only in 64-bit mode? We can use SSE moves also in 32-bit mode. >>>>> >>>>> I will extend it to 32-bit mode. >>>> >>>> It doesn't work in 32-bit mode due to >>>> >>>> #define MAX_FIXED_MODE_SIZE GET_MODE_BITSIZE (TARGET_64BIT ? TImode : >>>> DImode): >>>> >>>> /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc >>>> -B/export/build/gnu/gcc/build-x86_64-linux/gcc/ -O2 >>>> -fno-asynchronous-unwind-tables -m32 -S -o x.s x.i >>>> x.i: In function ‘foo’: >>>> x.i:6:10: internal compiler error: in by_pieces_ninsns, at expr.c:799 >>>> return __builtin_mempcpy (dst, src, 32); >>>> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> >>> This happens since by_pieces_ninsns determines widest mode by calling >>> widest_*INT*_mode_for_size, while moves can also use vector-mode >>> moves. This is an infrastructure problem, and will bite you on 64bit >>> targets when MOVE_MAX_PIECES returns OImode or XImode size. >> >> I opened: >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=74113 >> >>> +#define MOVE_MAX_PIECES \ >>> + ((TARGET_64BIT \ >>> + && TARGET_SSE2 \ >>> + && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \ >>> + && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) ? 16 : UNITS_PER_WORD) >>> >>> The above part is OK with an appropriate ??? comment, describing the >>> infrastructure limitation. Also, please use GET_MODE_SIZE (TImode) >>> instead of magic constant. >>> >>> Can you please submit the realignment patch as a separate follow-up >>> patch? Let's keep two issues separate. >>> >>> Uros. >> >> Here is the updated patch. OK for trunk? > > OK, but please do not yet introduce: > > +/* No need to dynamically realign the stack here. */ > +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ > +/* Nor use a frame pointer. */ > +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ > > in the testcases. This should be part of a followup patch. > > Thanks, > Uros.
This is what I checked in. Thanks. -- H.J.
From e52d6c1aed5a688fd28b9f6e005c6d9f3857b9cf Mon Sep 17 00:00:00 2001 From: "H.J. Lu" <hjl.to...@gmail.com> Date: Mon, 8 Aug 2016 09:08:01 -0700 Subject: [PATCH] Use TImode for piecewise move in 64-bit mode Use TImode for piecewise move in 64-bit mode. We should use TImode in 32-bit mode and use OImode or XImode if they are available. But since by_pieces_ninsns determines the widest mode with MAX_FIXED_MODE_SIZE, we can only use TImode in 64-bit mode. gcc/ * config/i386/i386.h (MOVE_MAX_PIECES): Use TImode in 64-bit mode if unaligned SSE load and store are optimal. gcc/testsuite/ * gcc.target/i386/pieces-memcpy-1.c: New test. * gcc.target/i386/pieces-memcpy-2.c: Likewise. * gcc.target/i386/pieces-memcpy-3.c: Likewise. * gcc.target/i386/pieces-memcpy-4.c: Likewise. * gcc.target/i386/pieces-memcpy-5.c: Likewise. * gcc.target/i386/pieces-memcpy-6.c: Likewise. --- gcc/config/i386/i386.h | 14 ++++++++++++-- gcc/testsuite/gcc.target/i386/pieces-memcpy-1.c | 13 +++++++++++++ gcc/testsuite/gcc.target/i386/pieces-memcpy-2.c | 13 +++++++++++++ gcc/testsuite/gcc.target/i386/pieces-memcpy-3.c | 13 +++++++++++++ gcc/testsuite/gcc.target/i386/pieces-memcpy-4.c | 13 +++++++++++++ gcc/testsuite/gcc.target/i386/pieces-memcpy-5.c | 13 +++++++++++++ gcc/testsuite/gcc.target/i386/pieces-memcpy-6.c | 13 +++++++++++++ 7 files changed, 90 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-4.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-5.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-6.c diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 9b66264..8751143 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -1950,8 +1950,18 @@ typedef struct ix86_args { /* MOVE_MAX_PIECES is the number of bytes at a time which we can move efficiently, as opposed to MOVE_MAX which is the maximum - number of bytes we can move with a single instruction. */ -#define MOVE_MAX_PIECES UNITS_PER_WORD + number of bytes we can move with a single instruction. + + ??? We should use TImode in 32-bit mode and use OImode or XImode + if they are available. But since by_pieces_ninsns determines the + widest mode with MAX_FIXED_MODE_SIZE, we can only use TImode in + 64-bit mode. */ +#define MOVE_MAX_PIECES \ + ((TARGET_64BIT \ + && TARGET_SSE2 \ + && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \ + && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \ + ? GET_MODE_SIZE (TImode) : UNITS_PER_WORD) /* If a memory-to-memory move would take MOVE_RATIO or more simple move-instruction pairs, we will do a movmem or libcall instead. diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-1.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-1.c new file mode 100644 index 0000000..22202c2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-1.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 64); +} + +/* { dg-final { scan-assembler-times "movdqu\[ \\t\]+\[^\n\]*%xmm" 4 } } */ +/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 4 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-2.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-2.c new file mode 100644 index 0000000..bc4f05b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-2.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 33); +} + +/* { dg-final { scan-assembler-times "movdqu\[ \\t\]+\[^\n\]*%xmm" 2 } } */ +/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 2 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-3.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-3.c new file mode 100644 index 0000000..84d6676 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-3.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 17); +} + +/* { dg-final { scan-assembler-times "movdqu\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-4.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-4.c new file mode 100644 index 0000000..64e8921 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-4.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune=generic" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 18); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "vmovups\[ \\t\]+\[^\n\]*%xmm" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-5.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-5.c new file mode 100644 index 0000000..3c464c3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-5.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mavx512f -mtune=generic" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 19); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "vmovups\[ \\t\]+\[^\n\]*%xmm" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-6.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-6.c new file mode 100644 index 0000000..cdb00e0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-6.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune=sandybridge" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 33); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 2 } } */ +/* { dg-final { scan-assembler-times "vmovups\[ \\t\]+\[^\n\]*%xmm" 2 } } */ -- 2.7.4