Finally a patch that works and is simple. Bootstrapped and regression tested on s390, s390x biarch and x86_64. The new patch exploits the known alignment of (stack pointer + STACK_DYNAMIC_OFFSET) as described earlier (see below). I think that is the right way to get rid of the extra allocation. It took a long time to understand the problem.
As the patch triggers a bug in the fortran compiler, the der_type.f90 test case may fail on some targets if this patch is used without the fortran fix that I've posted in another thread. (The patch also contains a fix for a typo in a comment in the patched function.) See ChangeLog for a full description of the new patch. Since the patch is all new, we're not going to commit it without a new OK. > Actually I was goind to abandon the patch in its current state. > :-) We talked about it internally and concluded that the problem > is really this: > > * get_dynamic_stack_size is passed a SIZE of a data block (which > is allocated elsewhere), the SIZE_ALIGN of the SIZE (i.e. the > alignment of the underlying memory units (e.g. 32 bytes split > into 4 times 8 bytes = 64 bit alignment) and the > REQUIRED_ALIGN of the data portion of the allocated memory. > * Assuming the function is called with SIZE = 2, SIZE_ALIGN = 8 > and REQUIRED_ALIGN = 64 it first adds 7 bytes to SIZE -> 9. > This is what is needed to have two bytes 8-byte-aligned at some > memory location without any known alignment. > * Finally round_push is called to round up SIZE to a multiple of > the stack slot size. > > The key to understanding this is that the function assumes that > STACK_DYNMAIC_OFFSET is completely unknown at the time its called > and therefore it does not make assumptions about the alignment of > STACKPOINTER + STACK_DYNMAIC_OFFSET. The latest patch simply > hard-codes that SP + SDO is supposed to be aligned to at least > stack slot size (and does that in a very complicated way). Since > there is no guarantee that this is the case on all targets, the > patch is broken. It may miscalculate a SIZE that is too small in > some cases. > > However, on many targets there is some guarantee about the > alignment of SP + SDO even if the actual value of SDO is unknown. > On s390x it's always 8-byte-aligned (stack slot size). So the > right fix should be to add knowledge about the target's guaranteed > alignment of SP + SDO to the function. I'm right now testing a > much simpler patch that uses > REGNO_POINTER_ALIGN(VIRTUAL_STACK_DYNAMIC_REGNUM) as the > alignment. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany
gcc/ChangeLog * explow.c (get_dynamic_stack_size): Take known alignment of stack pointer + STACK_DYNAMIC_OFFSET into account when calculating the size needed. Correct a typo in a comment.
>From 7d7a6b7fdb189759eb11b96c93ee4adfa8608a97 Mon Sep 17 00:00:00 2001 From: Dominik Vogt <v...@linux.vnet.ibm.com> Date: Fri, 29 Apr 2016 08:36:59 +0100 Subject: [PATCH] Reduce size allocated for run time allocated stack variables. The present calculation sometimes led to more stack memory being used than necessary with alloca. First, (STACK_BOUNDARY -1) would be added to the allocated size: size = plus_constant (Pmode, size, extra); size = force_operand (size, NULL_RTX); Then round_push was called and added another (STACK_BOUNDARY - 1) before rounding down to a multiple of STACK_BOUNDARY. On s390x this resulted in adding 14 before rounding down for "x" in the test case pr36728-1.c. The problem was that get_dynamic_stack_size did not take into account that the target might guarantee some alignment of (stack_pointer + STACK_DYNAMIC_OFFSET) even if the value of STACK_DYNAMIC_OFFSET is not known yet. --- gcc/explow.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/gcc/explow.c b/gcc/explow.c index a345690..f97a214 100644 --- a/gcc/explow.c +++ b/gcc/explow.c @@ -1224,9 +1224,15 @@ get_dynamic_stack_size (rtx *psize, unsigned size_align, example), so we must preventively align the value. We leave space in SIZE for the hole that might result from the alignment operation. */ - extra = (required_align - BITS_PER_UNIT) / BITS_PER_UNIT; - size = plus_constant (Pmode, size, extra); - size = force_operand (size, NULL_RTX); + unsigned known_align = REGNO_POINTER_ALIGN (VIRTUAL_STACK_DYNAMIC_REGNUM); + if (known_align == 0) + known_align = BITS_PER_UNIT; + if (required_align > known_align) + { + extra = (required_align - known_align) / BITS_PER_UNIT; + size = plus_constant (Pmode, size, extra); + size = force_operand (size, NULL_RTX); + } if (flag_stack_usage_info && pstack_usage_size) *pstack_usage_size += extra; @@ -1235,7 +1241,7 @@ get_dynamic_stack_size (rtx *psize, unsigned size_align, size_align = BITS_PER_UNIT; /* Round the size to a multiple of the required stack alignment. - Since the stack if presumed to be rounded before this allocation, + Since the stack is presumed to be rounded before this allocation, this will maintain the required alignment. If the stack grows downward, we could save an insn by subtracting -- 2.3.0