http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49519
--- Comment #6 from Yukhin Kirill <kirill.yukhin at intel dot com> 2011-06-30 15:11:41 UTC --- I've looked into tail-call opt. Seems we need not call it at all if we have new/old stack addresses for parameters overlap. BTW, I think it is to conservative, anyway... We have call to pointer and passing of 5 params. Last param is out of our interest, but first 4 do. We have in expand: GIMPLE snippet: D.172468_17 = MEM[(struct cons &)&arg_refs + 12].head; D.172469_18 = MEM[(struct cons &)&arg_refs + 8].head; D.172470_19 = MEM[(struct cons &)&arg_refs + 4].head; D.172471_20 = MEM[(struct cons &)&arg_refs]; D.172462_21 = (sizetype) fun_ptr$__delta_26; D.172463_22 = obj_3(D) + D.172462_21; fun_ptr$__pfn_23 (D.172463_22, D.172471_20, D.172470_19, D.172469_18, D.172468_17); [tail call] And subsequently expanding it we have RTL: (insn 19 18 20 4 (set (reg/f:SI 80) (mem/s/f/j/c:SI (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 28 [0x1c])) [0 MEM[(struct cons &)&arg_refs + 12].head+0 S4 A32])) include/base/thread_management.h:1534 -1 (nil)) (insn 20 19 21 4 (set (mem:SI (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 16 [0x10])) [0 S4 A32]) (reg/f:SI 80)) include/base/thread_management.h:1534 -1 (nil)) (insn 21 20 22 4 (set (reg/f:SI 81) (mem/s/f/j/c:SI (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 24 [0x18])) [0 MEM[(struct cons &)&arg_refs + 8].head+0 S4 A32])) include/base/thread_management.h:1534 -1 (nil)) (insn 22 21 23 4 (set (mem:SI (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 12 [0xc])) [0 S4 A32]) (reg/f:SI 81)) include/base/thread_management.h:1534 -1 (nil)) (insn 23 22 24 4 (set (reg/f:SI 82) (mem/s/f/j/c:SI (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 20 [0x14])) [0 MEM[(struct cons &)&arg_refs + 4].head+0 S4 A32])) include/base/thread_management.h:1534 -1 (nil)) (insn 24 23 25 4 (set (mem:SI (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 8 [0x8])) [0 S4 A32]) (reg/f:SI 82)) include/base/thread_management.h:1534 -1 (nil)) (insn 25 24 26 4 (parallel [ (set (reg:SI 83) (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 16 [0x10]))) (clobber (reg:CC 17 flags)) ]) step-14.cc:4271 -1 (nil)) (insn 26 25 27 4 (set (reg/f:SI 84) <---- (mem/f/c:SI (reg:SI 83) [0 MEM[(struct cons &)&arg_refs]+0 S4 A32])) include/base/thread_management.h:1534 -1 <---- (nil)) (insn 27 26 28 4 (set (mem:SI (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 4 [0x4])) [0 S4 A32]) (reg/f:SI 84)) include/base/thread_management.h:1534 -1 (nil)) (insn 28 27 29 4 (parallel [ (set (reg:SI 85) (plus:SI (reg/v/f:SI 77 [ obj ]) (reg:SI 74 [ fun_ptr$__delta ]))) (clobber (reg:CC 17 flags)) ]) include/base/thread_management.h:1534 -1 (nil)) (insn 29 28 30 4 (set (mem:SI (reg/f:SI 53 virtual-incoming-args) [0 S4 A32]) (reg:SI 85)) include/base/thread_management.h:1534 -1 (nil)) (call_insn/j 30 29 31 4 (call (mem:QI (reg/f:SI 59 [ fun_ptr$__pfn ]) [0 *fun_ptr$__pfn_23 S1 A8]) (const_int 20 [0x14])) include/base/thread_management.h:1534 -1 (nil) (expr_list:REG_DEP_TRUE (use (mem/f/i:SI (reg/f:SI 53 virtual-incoming-args) [0 S4 A32])) (expr_list:REG_DEP_TRUE (use (mem/f/i:SI (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 4 [0x4])) [0 S4 A32])) (expr_list:REG_DEP_TRUE (use (mem/f/i:SI (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 8 [0x8])) [0 S4 A32])) (expr_list:REG_DEP_TRUE (use (mem/f/i:SI (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 12 [0xc])) [0 S4 A32])) (expr_list:REG_DEP_TRUE (use (mem/f/i:SI (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 16 [0x10])) [0 S4 A32])) (nil))))))) You can see that calculation of address of 4-th param is performed in different way. We calculate a sum, store it to register, load memory from that address and the put it on the new stack. BUT. Predicate which check for memory overlapping looks like this: static bool mem_overlaps_already_clobbered_arg_p (rtx addr, unsigned HOST_WIDE_INT size) { HOST_WIDE_INT i; if (addr == crtl->args.internal_arg_pointer) i = 0; else if (GET_CODE (addr) == PLUS && XEXP (addr, 0) == crtl->args.internal_arg_pointer && CONST_INT_P (XEXP (addr, 1))) i = INTVAL (XEXP (addr, 1)); /* Return true for arg pointer based indexed addressing. */ else if (GET_CODE (addr) == PLUS && (XEXP (addr, 0) == crtl->args.internal_arg_pointer || XEXP (addr, 1) == crtl->args.internal_arg_pointer)) return true; else return false; ..... You can see that if we have load which does not look like (esp+*), routine always states that there is no overlap. That is why tail-call applied, while he mustn't.