Finally a patch that works and is simple.  Bootstrapped and
regression tested on s390, s390x biarch and x86_64.  The new patch
exploits the known alignment of (stack pointer +
STACK_DYNAMIC_OFFSET) as described earlier (see below).  I think
that is the right way to get rid of the extra allocation.  It
took a long time to understand the problem.

As the patch triggers a bug in the fortran compiler, the
der_type.f90 test case may fail on some targets if this patch is
used without the fortran fix that I've posted in another thread.

(The patch also contains a fix for a typo in a comment in the
patched function.)

See ChangeLog for a full description of the new patch.

Since the patch is all new, we're not going to commit it without a
new OK.

> Actually I was goind to abandon the patch in its current state.
> :-)  We talked about it internally and concluded that the problem
> is really this:
> 
>  * get_dynamic_stack_size is passed a SIZE of a data block (which
>    is allocated elsewhere), the SIZE_ALIGN of the SIZE (i.e. the
>    alignment of the underlying memory units (e.g. 32 bytes split
>    into 4 times 8 bytes = 64 bit alignment) and the
>    REQUIRED_ALIGN of the data portion of the allocated memory.
>  * Assuming the function is called with SIZE = 2, SIZE_ALIGN = 8
>    and REQUIRED_ALIGN = 64 it first adds 7 bytes to SIZE -> 9.
>    This is what is needed to have two bytes 8-byte-aligned at some
>    memory location without any known alignment.
>  * Finally round_push is called to round up SIZE to a multiple of
>    the stack slot size.
> 
> The key to understanding this is that the function assumes that
> STACK_DYNMAIC_OFFSET is completely unknown at the time its called
> and therefore it does not make assumptions about the alignment of
> STACKPOINTER + STACK_DYNMAIC_OFFSET.  The latest patch simply
> hard-codes that SP + SDO is supposed to be aligned to at least
> stack slot size (and does that in a very complicated way).  Since
> there is no guarantee that this is the case on all targets, the
> patch is broken.  It may miscalculate a SIZE that is too small in
> some cases.
> 
> However, on many targets there is some guarantee about the
> alignment of SP + SDO even if the actual value of SDO is unknown.
> On s390x it's always 8-byte-aligned (stack slot size).  So the
> right fix should be to add knowledge about the target's guaranteed
> alignment of SP + SDO to the function.  I'm right now testing a
> much simpler patch that uses
> REGNO_POINTER_ALIGN(VIRTUAL_STACK_DYNAMIC_REGNUM) as the
> alignment.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

        * explow.c (get_dynamic_stack_size): Take known alignment of stack
        pointer + STACK_DYNAMIC_OFFSET into account when calculating the size
        needed.
        Correct a typo in a comment.
>From 7d7a6b7fdb189759eb11b96c93ee4adfa8608a97 Mon Sep 17 00:00:00 2001
From: Dominik Vogt <v...@linux.vnet.ibm.com>
Date: Fri, 29 Apr 2016 08:36:59 +0100
Subject: [PATCH] Reduce size allocated for run time allocated stack
 variables.

The present calculation sometimes led to more stack memory being used than
necessary with alloca.  First, (STACK_BOUNDARY -1) would be added to the
allocated size:

  size = plus_constant (Pmode, size, extra);
  size = force_operand (size, NULL_RTX);

Then round_push was called and added another (STACK_BOUNDARY - 1) before
rounding down to a multiple of STACK_BOUNDARY.  On s390x this resulted in
adding 14 before rounding down for "x" in the test case pr36728-1.c.

The problem was that get_dynamic_stack_size did not take into account that the
target might guarantee some alignment of (stack_pointer + STACK_DYNAMIC_OFFSET)
even if the value of STACK_DYNAMIC_OFFSET is not known yet.
---
 gcc/explow.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/gcc/explow.c b/gcc/explow.c
index a345690..f97a214 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -1224,9 +1224,15 @@ get_dynamic_stack_size (rtx *psize, unsigned size_align,
      example), so we must preventively align the value.  We leave space
      in SIZE for the hole that might result from the alignment operation.  */
 
-  extra = (required_align - BITS_PER_UNIT) / BITS_PER_UNIT;
-  size = plus_constant (Pmode, size, extra);
-  size = force_operand (size, NULL_RTX);
+  unsigned known_align = REGNO_POINTER_ALIGN (VIRTUAL_STACK_DYNAMIC_REGNUM);
+  if (known_align == 0)
+    known_align = BITS_PER_UNIT;
+  if (required_align > known_align)
+    {
+      extra = (required_align - known_align) / BITS_PER_UNIT;
+      size = plus_constant (Pmode, size, extra);
+      size = force_operand (size, NULL_RTX);
+    }
 
   if (flag_stack_usage_info && pstack_usage_size)
     *pstack_usage_size += extra;
@@ -1235,7 +1241,7 @@ get_dynamic_stack_size (rtx *psize, unsigned size_align,
     size_align = BITS_PER_UNIT;
 
   /* Round the size to a multiple of the required stack alignment.
-     Since the stack if presumed to be rounded before this allocation,
+     Since the stack is presumed to be rounded before this allocation,
      this will maintain the required alignment.
 
      If the stack grows downward, we could save an insn by subtracting
-- 
2.3.0

Reply via email to