http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10474

--- Comment #20 from Ryan Johnson <scovich at gmail dot com> ---
Hi Martin,

(PM reply because I don't have up-to-date information to file a proper 
bug report with)

On 25/11/2013 9:57 AM, jamborm at gcc dot gnu.org wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10474
>
> --- Comment #19 from Martin Jambor <jamborm at gcc dot gnu.org> ---
> (In reply to Ryan Johnson from comment #18)
>> Great! Does this mean shrink-wrapping will be in gcc-4.9, at least for
>> x86_64 and ppc64?
> Well, a fairly basic (but not altogether unreasonable) shrink-wrapping
> "was" in gcc 4.8 (and earlier versions) too and that has not changed
> at all.  The problem with this and similar testcases was that the
> register allocator made decisions which made shrink-wrapping
> impossible (or at least too difficult to perform).  The change I
> committed and which will be a part of gcc 4.9 fixes this for a class
> of pseudo-registers which commonly result in this problem but other
> cases will still remain unresolved, for example PR 51982.  For some
> statistics about what impact the implemented technique has, see the
> email accompanying the first submission of the patch:
> http://gcc.gnu.org/ml/gcc-patches/2013-10/msg01719.html
>
> If you find another similar example which is important and clearly
> possible to shrink-wrap but we don't do it, feel free to submit a
> new missed-optimization bug and CC me.
>
One that comes to mind right off, but is from several years ago and 
possibly no longer true: on platforms like solaris/sparc, accesses to 
thread-local storage require a function call to retrieve the base of 
thread-local storage; the compiler seems to emit the call once, in the 
function prologue. I strongly suspect (but can't confirm, since I no 
longer have access to Solaris/sparc) that such a 
function-call-in-prologue would confound subsequent efforts at shrink 
wrapping. I don't know how often this sort of scenario arises any more, 
though. It may be that the new emutls stuff has changed everything, 
because on cygwin and gcc-4.8 I now see separate calls into emutls for 
every TLS access.

As for PR 51982, it looks like having flow-sensitive local analysis 
could go a long way: just as it can be useful know that an "escaped" 
pointer has not *yet* escaped (e.g. PR 50346), here it would be useful 
to know that the stack frame, though perhaps eventually needed, is not 
needed just yet. Then, generation of the stack frame can be pushed down 
to the first basic block(s) where the need for a stack frame is 
undisputed, after any conditions that gate it. But I've been told that 
teaching gcc to think that way would not be easy...

In any case, thanks for the improvement to a hairy problem.

Regards,
Ryan

Reply via email to