On Thu, Aug 11, 2016 at 6:12 PM, Uros Bizjak <ubiz...@gmail.com> wrote: > On Thu, Aug 11, 2016 at 5:51 PM, H.J. Lu <hjl.to...@gmail.com> wrote: > >>>>>>>>> Use TImode for piecewise move in 64-bit mode. When vector register >>>>>>>>> is used for piecewise move, we don't increase stack_alignment_needed >>>>>>>>> since vector register spill isn't required for piecewise move. Since >>>>>>>>> stack_realign_needed is set to true by checking >>>>>>>>> stack_alignment_estimated >>>>>>>>> set by pseudo vector register usage, we also need to check >>>>>>>>> stack_realign_needed to eliminate frame pointer. >>>>>>>> >>>>>>>> Why only in 64-bit mode? We can use SSE moves also in 32-bit mode. >>>>>>> >>>>>>> I will extend it to 32-bit mode. >>>>>> >>>>>> It doesn't work in 32-bit mode due to >>>>>> >>>>>> #define MAX_FIXED_MODE_SIZE GET_MODE_BITSIZE (TARGET_64BIT ? TImode : >>>>>> DImode): >>>>>> >>>>>> /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc >>>>>> -B/export/build/gnu/gcc/build-x86_64-linux/gcc/ -O2 >>>>>> -fno-asynchronous-unwind-tables -m32 -S -o x.s x.i >>>>>> x.i: In function ‘foo’: >>>>>> x.i:6:10: internal compiler error: in by_pieces_ninsns, at expr.c:799 >>>>>> return __builtin_mempcpy (dst, src, 32); >>>>>> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>> >>>>> This happens since by_pieces_ninsns determines widest mode by calling >>>>> widest_*INT*_mode_for_size, while moves can also use vector-mode >>>>> moves. This is an infrastructure problem, and will bite you on 64bit >>>>> targets when MOVE_MAX_PIECES returns OImode or XImode size. >>>> >>>> I opened: >>>> >>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=74113 >>>> >>>>> +#define MOVE_MAX_PIECES \ >>>>> + ((TARGET_64BIT \ >>>>> + && TARGET_SSE2 \ >>>>> + && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \ >>>>> + && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) ? 16 : UNITS_PER_WORD) >>>>> >>>>> The above part is OK with an appropriate ??? comment, describing the >>>>> infrastructure limitation. Also, please use GET_MODE_SIZE (TImode) >>>>> instead of magic constant. >>>>> >>>>> Can you please submit the realignment patch as a separate follow-up >>>>> patch? Let's keep two issues separate. >>>>> >>>>> Uros. >>>> >>>> Here is the updated patch. OK for trunk? >>> >>> OK, but please do not yet introduce: >>> >>> +/* No need to dynamically realign the stack here. */ >>> +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ >>> +/* Nor use a frame pointer. */ >>> +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ >>> >>> in the testcases. This should be part of a followup patch. >> >> This is what I checked in. > > Playing a bit with a patched gcc, I found no stack realignment insns > in the assembly of the provided testcases. However, if > -mincoming-stack-boundary=3 is added, then no vector instructions are > generated (and also no realignment insns).
Ah yes, STV pass is disabled for -mincoming-stack-boundary={2,3}. It looks that we don't need extra realignment patch. Uros.