On Sun, Aug 5, 2018 at 12:48 AM, H.J. Lu <[email protected]> wrote:
> On Sat, Aug 04, 2018 at 11:48:15PM +0200, Uros Bizjak wrote:
>> On Sat, Aug 4, 2018 at 9:49 PM, H.J. Lu <[email protected]> wrote:
>> > On Sat, Aug 4, 2018 at 12:09 PM, Uros Bizjak <[email protected]> wrote:
>> >> On Sat, Aug 4, 2018 at 3:59 PM, H.J. Lu <[email protected]> wrote:
>> >>> On Sat, Aug 4, 2018 at 3:42 AM, Uros Bizjak <[email protected]> wrote:
>> >>>> On Fri, Aug 3, 2018 at 12:55 AM, H.J. Lu <[email protected]> wrote:
>> >>>>> We should always set cfun->machine->max_used_stack_alignment if the
>> >>>>> maximum stack slot alignment may be greater than 64 bits.
>> >>>>>
>> >>>>> Tested on i686 and x86-64. OK for master and backport for GCC 8?
>> >>>>
>> >>>> Can you explain why 64 bits, and what this value represents? Is this
>> >>>> value the same for 64bit and 32bit targets?
>> >>>>
>> >>>> Should crtl->max_used_stack_slot_alignment be compared to
>> >>>> STACK_BOUNDARY or even MIN_STACK_BOUNDARY instead?
>> >>>
>> >>> In this case, we don't need to realign the incoming stack since both
>> >>> crtl->max_used_stack_slot_alignment and crtl->preferred_stack_boundary
>> >>> are 128 bits. We don't compute the largest alignment of stack slots to
>> >>> keep stack frame properly aligned for it. Normally it is OK. But if
>> >>> the largest alignment of stack slots > 64 bits and we don't keep stack
>> >>> frame proper aligned, we will get segfault if aligned vector load/store
>> >>> are used on these unaligned stack slots. My patch computes the
>> >>> largest alignment of stack slots, which are actually used, if the
>> >>> largest alignment of stack slots allocated is > 64 bits, which is
>> >>> the smallest alignment for misaligned load/store.
>> >>
>> >> Does > 64 bits also hold for 32 bit targets, and -mabi=ms? I think
>> >> that we need to compare to STACK_BOUNDARY instead:
>> >
>> > 64 bits requirement is independent of any psABIs nor 32-bit vs 64-bit.
>> > cfun->machine->max_used_stack_alignment is used to decide how
>> > stack frame should be aligned. It is always safe to compute it. I used
>> >
>> > else if (crtl->max_used_stack_slot_alignment > 64)
>> >
>> > to compute cfun->machine->max_used_stack_alignment only if
>> > we have to.
>> >
>> >> --cut here--
>> >> Index: config/i386/i386.c
>> >> ===================================================================
>> >> --- config/i386/i386.c (revision 263307)
>> >> +++ config/i386/i386.c (working copy)
>> >> @@ -13281,8 +13281,7 @@
>> >> recompute_frame_layout_p = true;
>> >> }
>> >> }
>> >> - else if (crtl->max_used_stack_slot_alignment
>> >> - > crtl->preferred_stack_boundary)
>> >> + else if (crtl->max_used_stack_slot_alignment > STACK_BOUNDARY)
>> >> {
>> >
>> > This isn't correct.. cygming.h has
>> >
>> > #define STACK_BOUNDARY (TARGET_64BIT && ix86_abi == MS_ABI ? 128 :
>> > BITS_PER_WORD)
>> >
>> > darwin.h has
>> >
>> > #undef STACK_BOUNDARY
>> > #define STACK_BOUNDARY \
>> > ((profile_flag || (TARGET_64BIT && ix86_abi == MS_ABI)) \
>> > ? 128 : BITS_PER_WORD)
>> >
>> > i386.h has
>> >
>> > /* Boundary (in *bits*) on which stack pointer should be aligned. */
>> > #define STACK_BOUNDARY \
>> > (TARGET_64BIT && ix86_abi == MS_ABI ? 128 : BITS_PER_WORD)
>> >
>> > If STACK_BOUNDARY is 128 and max_used_stack_slot_alignment is 128,
>> > we will get segment when 128bit aligned load/store is generated on
>> > misaligned
>> > stack slot.
>> >
>> >> /* We don't need to realign stack. But we still need to keep
>> >> stack frame properly aligned to satisfy the largest alignment
>> >> --cut here--
>> >>
>> >> (The testcase works OK with -mabi=ms, which somehow suggests that we
>> >> don't need realignment in this case).
>> >
>> > We may not hit 128bit aligned load/store on misaligned stack slot in this
>> > case. It doesn't mean that won't happen.
>> >
>> > else if (crtl->max_used_stack_slot_alignment > 64)
>> >
>> > is the correct thing to do here.
>>
>> OK, but please add a comment, so in the future we will still know the
>> purpose of the magic number.
>>
>
> Like this?
>
> H.J.
> ---
> cfun->machine->max_used_stack_alignment is used to decide how stack frame
> should be aligned. This is independent of any psABIs nor 32-bit vs 64-bit.
> It is always safe to compute max_used_stack_alignment. We compute it only
> if 128-bit aligned load/store may be generated on misaligned stack slot
> which will lead to segfault.
>
> gcc/
>
> PR target/86386
> * config/i386/i386.c (ix86_finalize_stack_frame_flags): Set
> cfun->machine->max_used_stack_alignment if needed.
>
> gcc/testsuite/
>
> PR target/86386
> * gcc.target/i386/pr86386.c: New file.
OK, but please write the condition as ">= 128". The number 64 confused
me; I was thinking that it has something to do with minimum alignment
on 64bit target, while 128 clearly shows that alignment is related to
SSE moves. With ">= 128", I think that code is clear enough that a
long comment is not needed.
Thanks, and sorry for the confusion,
Uros.
> ---
> gcc/config/i386/i386.c | 14 +++++++------
> gcc/testsuite/gcc.target/i386/pr86386.c | 26 +++++++++++++++++++++++++
> 2 files changed, 34 insertions(+), 6 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/i386/pr86386.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index ee409cfe7e4..cf8c33bd909 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -13281,12 +13281,14 @@ ix86_finalize_stack_frame_flags (void)
> recompute_frame_layout_p = true;
> }
> }
> - else if (crtl->max_used_stack_slot_alignment
> - > crtl->preferred_stack_boundary)
> - {
> - /* We don't need to realign stack. But we still need to keep
> - stack frame properly aligned to satisfy the largest alignment
> - of stack slots. */
> + else if (crtl->max_used_stack_slot_alignment > 64)
> + {
> + /* We don't need to realign stack. max_used_stack_alignment is
> + used to decide how stack frame should be aligned. This is
> + independent of any psABIs nor 32-bit vs 64-bit. It is always
> + safe to compute max_used_stack_alignment. We compute it only
> + if 128-bit aligned load/store may be generated on misaligned
> + stack slot which will lead to segfault. */
> if (ix86_find_max_used_stack_alignment (stack_alignment, true))
> cfun->machine->max_used_stack_alignment
> = stack_alignment / BITS_PER_UNIT;
> diff --git a/gcc/testsuite/gcc.target/i386/pr86386.c
> b/gcc/testsuite/gcc.target/i386/pr86386.c
> new file mode 100644
> index 00000000000..a67cf45444e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr86386.c
> @@ -0,0 +1,26 @@
> +/* PR target/86386 */
> +/* { dg-do run { target { avx_runtime && int128 } } } */
> +/* { dg-options "-Os -fno-tree-dce -mstringop-strategy=vector_loop -mavx" }
> */
> +
> +unsigned c, d, e, f;
> +
> +unsigned __attribute__((noipa))
> +foo (unsigned char g, unsigned short h, unsigned i, unsigned long long j,
> + unsigned char k, unsigned short l, unsigned m, unsigned __int128 n)
> +{
> + __builtin_memset (&e, 0, 3);
> + n <<= m;
> + __builtin_memcpy (&m, 2 + (char *) &n, 1);
> + m >>= 0;
> + d ^= __builtin_mul_overflow (l, n, &m);
> + return m;
> +}
> +
> +int
> +main ()
> +{
> + unsigned __int128 x = foo (0, 0, 0, 0, 0, 4, 1, 3);
> + if (x != 24)
> + __builtin_abort ();
> + return 0;
> +}
> --
> 2.17.1
>