On Thu, Aug 11, 2016 at 5:51 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
>>>>>>>> Use TImode for piecewise move in 64-bit mode. When vector register >>>>>>>> is used for piecewise move, we don't increase stack_alignment_needed >>>>>>>> since vector register spill isn't required for piecewise move. Since >>>>>>>> stack_realign_needed is set to true by checking >>>>>>>> stack_alignment_estimated >>>>>>>> set by pseudo vector register usage, we also need to check >>>>>>>> stack_realign_needed to eliminate frame pointer. >>>>>>> >>>>>>> Why only in 64-bit mode? We can use SSE moves also in 32-bit mode. >>>>>> >>>>>> I will extend it to 32-bit mode. >>>>> >>>>> It doesn't work in 32-bit mode due to >>>>> >>>>> #define MAX_FIXED_MODE_SIZE GET_MODE_BITSIZE (TARGET_64BIT ? TImode : >>>>> DImode): >>>>> >>>>> /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc >>>>> -B/export/build/gnu/gcc/build-x86_64-linux/gcc/ -O2 >>>>> -fno-asynchronous-unwind-tables -m32 -S -o x.s x.i >>>>> x.i: In function ‘foo’: >>>>> x.i:6:10: internal compiler error: in by_pieces_ninsns, at expr.c:799 >>>>> return __builtin_mempcpy (dst, src, 32); >>>>> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> >>>> This happens since by_pieces_ninsns determines widest mode by calling >>>> widest_*INT*_mode_for_size, while moves can also use vector-mode >>>> moves. This is an infrastructure problem, and will bite you on 64bit >>>> targets when MOVE_MAX_PIECES returns OImode or XImode size. >>> >>> I opened: >>> >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=74113 >>> >>>> +#define MOVE_MAX_PIECES \ >>>> + ((TARGET_64BIT \ >>>> + && TARGET_SSE2 \ >>>> + && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \ >>>> + && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) ? 16 : UNITS_PER_WORD) >>>> >>>> The above part is OK with an appropriate ??? comment, describing the >>>> infrastructure limitation. Also, please use GET_MODE_SIZE (TImode) >>>> instead of magic constant. >>>> >>>> Can you please submit the realignment patch as a separate follow-up >>>> patch? Let's keep two issues separate. >>>> >>>> Uros. >>> >>> Here is the updated patch. OK for trunk? >> >> OK, but please do not yet introduce: >> >> +/* No need to dynamically realign the stack here. */ >> +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ >> +/* Nor use a frame pointer. */ >> +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ >> >> in the testcases. This should be part of a followup patch. > > This is what I checked in. Playing a bit with a patched gcc, I found no stack realignment insns in the assembly of the provided testcases. However, if -mincoming-stack-boundary=3 is added, then no vector instructions are generated (and also no realignment insns). Uros.