On Sat, Aug 04, 2018 at 11:48:15PM +0200, Uros Bizjak wrote: > On Sat, Aug 4, 2018 at 9:49 PM, H.J. Lu <hjl.to...@gmail.com> wrote: > > On Sat, Aug 4, 2018 at 12:09 PM, Uros Bizjak <ubiz...@gmail.com> wrote: > >> On Sat, Aug 4, 2018 at 3:59 PM, H.J. Lu <hjl.to...@gmail.com> wrote: > >>> On Sat, Aug 4, 2018 at 3:42 AM, Uros Bizjak <ubiz...@gmail.com> wrote: > >>>> On Fri, Aug 3, 2018 at 12:55 AM, H.J. Lu <hongjiu...@intel.com> wrote: > >>>>> We should always set cfun->machine->max_used_stack_alignment if the > >>>>> maximum stack slot alignment may be greater than 64 bits. > >>>>> > >>>>> Tested on i686 and x86-64. OK for master and backport for GCC 8? > >>>> > >>>> Can you explain why 64 bits, and what this value represents? Is this > >>>> value the same for 64bit and 32bit targets? > >>>> > >>>> Should crtl->max_used_stack_slot_alignment be compared to > >>>> STACK_BOUNDARY or even MIN_STACK_BOUNDARY instead? > >>> > >>> In this case, we don't need to realign the incoming stack since both > >>> crtl->max_used_stack_slot_alignment and crtl->preferred_stack_boundary > >>> are 128 bits. We don't compute the largest alignment of stack slots to > >>> keep stack frame properly aligned for it. Normally it is OK. But if > >>> the largest alignment of stack slots > 64 bits and we don't keep stack > >>> frame proper aligned, we will get segfault if aligned vector load/store > >>> are used on these unaligned stack slots. My patch computes the > >>> largest alignment of stack slots, which are actually used, if the > >>> largest alignment of stack slots allocated is > 64 bits, which is > >>> the smallest alignment for misaligned load/store. > >> > >> Does > 64 bits also hold for 32 bit targets, and -mabi=ms? I think > >> that we need to compare to STACK_BOUNDARY instead: > > > > 64 bits requirement is independent of any psABIs nor 32-bit vs 64-bit. > > cfun->machine->max_used_stack_alignment is used to decide how > > stack frame should be aligned. It is always safe to compute it. I used > > > > else if (crtl->max_used_stack_slot_alignment > 64) > > > > to compute cfun->machine->max_used_stack_alignment only if > > we have to. > > > >> --cut here-- > >> Index: config/i386/i386.c > >> =================================================================== > >> --- config/i386/i386.c (revision 263307) > >> +++ config/i386/i386.c (working copy) > >> @@ -13281,8 +13281,7 @@ > >> recompute_frame_layout_p = true; > >> } > >> } > >> - else if (crtl->max_used_stack_slot_alignment > >> - > crtl->preferred_stack_boundary) > >> + else if (crtl->max_used_stack_slot_alignment > STACK_BOUNDARY) > >> { > > > > This isn't correct.. cygming.h has > > > > #define STACK_BOUNDARY (TARGET_64BIT && ix86_abi == MS_ABI ? 128 : > > BITS_PER_WORD) > > > > darwin.h has > > > > #undef STACK_BOUNDARY > > #define STACK_BOUNDARY \ > > ((profile_flag || (TARGET_64BIT && ix86_abi == MS_ABI)) \ > > ? 128 : BITS_PER_WORD) > > > > i386.h has > > > > /* Boundary (in *bits*) on which stack pointer should be aligned. */ > > #define STACK_BOUNDARY \ > > (TARGET_64BIT && ix86_abi == MS_ABI ? 128 : BITS_PER_WORD) > > > > If STACK_BOUNDARY is 128 and max_used_stack_slot_alignment is 128, > > we will get segment when 128bit aligned load/store is generated on > > misaligned > > stack slot. > > > >> /* We don't need to realign stack. But we still need to keep > >> stack frame properly aligned to satisfy the largest alignment > >> --cut here-- > >> > >> (The testcase works OK with -mabi=ms, which somehow suggests that we > >> don't need realignment in this case). > > > > We may not hit 128bit aligned load/store on misaligned stack slot in this > > case. It doesn't mean that won't happen. > > > > else if (crtl->max_used_stack_slot_alignment > 64) > > > > is the correct thing to do here. > > OK, but please add a comment, so in the future we will still know the > purpose of the magic number. >
Like this? H.J. --- cfun->machine->max_used_stack_alignment is used to decide how stack frame should be aligned. This is independent of any psABIs nor 32-bit vs 64-bit. It is always safe to compute max_used_stack_alignment. We compute it only if 128-bit aligned load/store may be generated on misaligned stack slot which will lead to segfault. gcc/ PR target/86386 * config/i386/i386.c (ix86_finalize_stack_frame_flags): Set cfun->machine->max_used_stack_alignment if needed. gcc/testsuite/ PR target/86386 * gcc.target/i386/pr86386.c: New file. --- gcc/config/i386/i386.c | 14 +++++++------ gcc/testsuite/gcc.target/i386/pr86386.c | 26 +++++++++++++++++++++++++ 2 files changed, 34 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr86386.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index ee409cfe7e4..cf8c33bd909 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -13281,12 +13281,14 @@ ix86_finalize_stack_frame_flags (void) recompute_frame_layout_p = true; } } - else if (crtl->max_used_stack_slot_alignment - > crtl->preferred_stack_boundary) - { - /* We don't need to realign stack. But we still need to keep - stack frame properly aligned to satisfy the largest alignment - of stack slots. */ + else if (crtl->max_used_stack_slot_alignment > 64) + { + /* We don't need to realign stack. max_used_stack_alignment is + used to decide how stack frame should be aligned. This is + independent of any psABIs nor 32-bit vs 64-bit. It is always + safe to compute max_used_stack_alignment. We compute it only + if 128-bit aligned load/store may be generated on misaligned + stack slot which will lead to segfault. */ if (ix86_find_max_used_stack_alignment (stack_alignment, true)) cfun->machine->max_used_stack_alignment = stack_alignment / BITS_PER_UNIT; diff --git a/gcc/testsuite/gcc.target/i386/pr86386.c b/gcc/testsuite/gcc.target/i386/pr86386.c new file mode 100644 index 00000000000..a67cf45444e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr86386.c @@ -0,0 +1,26 @@ +/* PR target/86386 */ +/* { dg-do run { target { avx_runtime && int128 } } } */ +/* { dg-options "-Os -fno-tree-dce -mstringop-strategy=vector_loop -mavx" } */ + +unsigned c, d, e, f; + +unsigned __attribute__((noipa)) +foo (unsigned char g, unsigned short h, unsigned i, unsigned long long j, + unsigned char k, unsigned short l, unsigned m, unsigned __int128 n) +{ + __builtin_memset (&e, 0, 3); + n <<= m; + __builtin_memcpy (&m, 2 + (char *) &n, 1); + m >>= 0; + d ^= __builtin_mul_overflow (l, n, &m); + return m; +} + +int +main () +{ + unsigned __int128 x = foo (0, 0, 0, 0, 0, 4, 1, 3); + if (x != 24) + __builtin_abort (); + return 0; +} -- 2.17.1