http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49519
--- Comment #6 from Yukhin Kirill <kirill.yukhin at intel dot com> 2011-06-30
15:11:41 UTC ---
I've looked into tail-call opt. Seems we need not call it at all if we have
new/old stack addresses for parameters overlap. BTW, I think it is to
conservative, anyway...
We have call to pointer and passing of 5 params. Last param is out of our
interest, but first 4 do.
We have in expand:
GIMPLE snippet:
D.172468_17 = MEM[(struct cons &)&arg_refs + 12].head;
D.172469_18 = MEM[(struct cons &)&arg_refs + 8].head;
D.172470_19 = MEM[(struct cons &)&arg_refs + 4].head;
D.172471_20 = MEM[(struct cons &)&arg_refs];
D.172462_21 = (sizetype) fun_ptr$__delta_26;
D.172463_22 = obj_3(D) + D.172462_21;
fun_ptr$__pfn_23 (D.172463_22, D.172471_20, D.172470_19, D.172469_18,
D.172468_17); [tail call]
And subsequently expanding it we have RTL:
(insn 19 18 20 4 (set (reg/f:SI 80)
(mem/s/f/j/c:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
(const_int 28 [0x1c])) [0 MEM[(struct cons &)&arg_refs +
12].head+0 S4 A32])) include/base/thread_management.h:1534 -1
(nil))
(insn 20 19 21 4 (set (mem:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
(const_int 16 [0x10])) [0 S4 A32])
(reg/f:SI 80)) include/base/thread_management.h:1534 -1
(nil))
(insn 21 20 22 4 (set (reg/f:SI 81)
(mem/s/f/j/c:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
(const_int 24 [0x18])) [0 MEM[(struct cons &)&arg_refs +
8].head+0 S4 A32])) include/base/thread_management.h:1534 -1
(nil))
(insn 22 21 23 4 (set (mem:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
(const_int 12 [0xc])) [0 S4 A32])
(reg/f:SI 81)) include/base/thread_management.h:1534 -1
(nil))
(insn 23 22 24 4 (set (reg/f:SI 82)
(mem/s/f/j/c:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
(const_int 20 [0x14])) [0 MEM[(struct cons &)&arg_refs +
4].head+0 S4 A32])) include/base/thread_management.h:1534 -1
(nil))
(insn 24 23 25 4 (set (mem:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
(const_int 8 [0x8])) [0 S4 A32])
(reg/f:SI 82)) include/base/thread_management.h:1534 -1
(nil))
(insn 25 24 26 4 (parallel [
(set (reg:SI 83)
(plus:SI (reg/f:SI 53 virtual-incoming-args)
(const_int 16 [0x10])))
(clobber (reg:CC 17 flags))
]) step-14.cc:4271 -1
(nil))
(insn 26 25 27 4 (set (reg/f:SI 84) <----
(mem/f/c:SI (reg:SI 83) [0 MEM[(struct cons &)&arg_refs]+0 S4 A32]))
include/base/thread_management.h:1534 -1 <----
(nil))
(insn 27 26 28 4 (set (mem:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
(const_int 4 [0x4])) [0 S4 A32])
(reg/f:SI 84)) include/base/thread_management.h:1534 -1
(nil))
(insn 28 27 29 4 (parallel [
(set (reg:SI 85)
(plus:SI (reg/v/f:SI 77 [ obj ])
(reg:SI 74 [ fun_ptr$__delta ])))
(clobber (reg:CC 17 flags))
]) include/base/thread_management.h:1534 -1
(nil))
(insn 29 28 30 4 (set (mem:SI (reg/f:SI 53 virtual-incoming-args) [0 S4 A32])
(reg:SI 85)) include/base/thread_management.h:1534 -1
(nil))
(call_insn/j 30 29 31 4 (call (mem:QI (reg/f:SI 59 [ fun_ptr$__pfn ]) [0
*fun_ptr$__pfn_23 S1 A8])
(const_int 20 [0x14])) include/base/thread_management.h:1534 -1
(nil)
(expr_list:REG_DEP_TRUE (use (mem/f/i:SI (reg/f:SI 53
virtual-incoming-args) [0 S4 A32]))
(expr_list:REG_DEP_TRUE (use (mem/f/i:SI (plus:SI (reg/f:SI 53
virtual-incoming-args)
(const_int 4 [0x4])) [0 S4 A32]))
(expr_list:REG_DEP_TRUE (use (mem/f/i:SI (plus:SI (reg/f:SI 53
virtual-incoming-args)
(const_int 8 [0x8])) [0 S4 A32]))
(expr_list:REG_DEP_TRUE (use (mem/f/i:SI (plus:SI (reg/f:SI 53
virtual-incoming-args)
(const_int 12 [0xc])) [0 S4 A32]))
(expr_list:REG_DEP_TRUE (use (mem/f/i:SI (plus:SI (reg/f:SI
53 virtual-incoming-args)
(const_int 16 [0x10])) [0 S4 A32]))
(nil)))))))
You can see that calculation of address of 4-th param is performed in different
way. We calculate a sum, store it to register, load memory from that address
and the put it on the new stack.
BUT. Predicate which check for memory overlapping looks like this:
static bool
mem_overlaps_already_clobbered_arg_p (rtx addr, unsigned HOST_WIDE_INT size)
{
HOST_WIDE_INT i;
if (addr == crtl->args.internal_arg_pointer)
i = 0;
else if (GET_CODE (addr) == PLUS
&& XEXP (addr, 0) == crtl->args.internal_arg_pointer
&& CONST_INT_P (XEXP (addr, 1)))
i = INTVAL (XEXP (addr, 1));
/* Return true for arg pointer based indexed addressing. */
else if (GET_CODE (addr) == PLUS
&& (XEXP (addr, 0) == crtl->args.internal_arg_pointer
|| XEXP (addr, 1) == crtl->args.internal_arg_pointer))
return true;
else
return false;
.....
You can see that if we have load which does not look like (esp+*), routine
always states that there is no overlap.
That is why tail-call applied, while he mustn't.