Re: random commentary on -fsplit-stack (and a bug report)

Jay Freeman (saurik) Tue, 28 Feb 2012 13:58:50 -0800

> > "Jay Freeman (saurik)" <sau...@saurik.com>
> "Ian Lance Taylor" <i...@google.com>


> Thanks for the bug report and the analysis.  I think it does simply
> require an '&'.  That makes it analogous to the way
> __morestack_release_segments is used in generic-morestack-thread.c. 

The only reason I hesitated on that is that it might not make sense to update 
the pointer in the context. In my specific case, that will actually cause it to 
crash ;P, as while the current stack I'm calling __splitstack_releasecontext 
from is valid, the context pointer I'm passing is actually stored on the old 
stack, and will be unallocated by __morestack_releasse_segments.

I can always just change my code to copy the context to the other stack before 
calling __splitstack_releasecontext, however, so that isn't a problem for me. 
Though, I also wasn't certain what the releasecontext function actually wanted 
to do with that pointer, as I hadn't yet read much of the morestack code; I now 
see that it is just the head of a linked list, so yeah: passing the address out 
of the context seems fine.

> As you know, I wanted to allow for future expansion.  I agree that it
> would be possible to avoid storing MORESTACK_SEGMENTS--that would trade
> off space for time, since it would mean that setcontext would have to
> walk up the list.  I think CURRENT_STACK is required for
> __splitstack_find_context.  And __splitstack_find_context is required
> for Go's garbage collector.  At least, it's not obvious to me how to
> avoid requiring CURRENT_STACK for that case.

The basis of that suggestion was not just that the items in the context could 
be removed, but that the underlying state used by split stacks might not need 
the values at all. In this case, I am not certain why __morestack_segments is 
needed: it seems to only come in to play when __morestack_current_segment is 
NULL (and I'm not certain how that would happen) and while deallocating dynamic 
blocks (which is already linear).

I might provide a patch to better describe what I mean by this. I've started 
the process of getting a copyright assignment in place (sent an e-mail to 
fsf-reco...@gnu.org per http://gcc.gnu.org/wiki/CopyrightAssignment).

> I agree.  Want to write a patch?  Or at least file a bug report.

Sure.

> [paragraph moved below]

> > 7) Using the linker to handle the transition between split-stack and
> > non-split-stack code seems like a good way to solve the problem of "we
> > need large stacks when hitting external code", but in staring at the
> > resulting code I have in my project I'm seeing that it isn't reliable:
> > if you have a pointer to a function the linker will not know what you
> > are calling. In my case, this is coming up often due to using
> > std::function.
> 
> Yes, good point.  I think I had some plan for handling that but I no
> longer recall what it was.

After getting more sleep, I realize that this problem is actually much more 
endemic than I had even previously thought. Most any vaguely object-oriented 
library is going to have tons of function pointers in it, and you often 
interact solely with those function pointers (as in, you have no actual symbol 
references anywhere). A simple example: in the case of C++, any call to a 
non-split-stack virtual function will fail.

"""Function pointers are a tricky case. In general we don't know whether a 
function pointer points to split-stack code. Therefore, all calls through a 
function pointer will be modified to call (or jump to) a special function 
__fnptr_morestack. This will use a target specific function calling sequence, 
and will be implemented as though it were itself a function call instruction. 
That is, all the parameters will be set up, and then the code will jump to 
__fnptr_morestack. The __fnptr_morestack function takes two parameters: the 
function pointer to call, and the number of bytes of arguments pushed on the 
stack. (This is not yet implemented.)"""

That paragraph is from your design document (SplitStacks on the GCC wiki). I 
presume that this solution would only work if __fnptr_morestack always assumed 
that the target did not support split-stack? Alternatively, I can see having 
that stub look at the function to see if its first instruction was a comparison 
to the TCB stack limit entry (using similar logic to that used by the linker)? 
[also, see below in this e-mail]

> > More awkwardly, split-stack functions that mention (but do not call)
> > non-split-stack functions (such as to return their address) are being
> > mis-flagged by the linker. Honestly, I question whether the linker
> > fundamentally has enough information about what is going on to be able
> > to make sufficiently accurate decisions with regards to stack
> > constraints to warrant the painful abstraction breakage that
> > split-stack uses. :(
> 
> Your're right that the linker doesn't really have enough information.
> But is a split-stack function that returns the address of a
> non-split-stack function really so frequent that it's worth worrying
> about?

I guess the question I have is: is one of the goals to make this option "safe 
to turn on for a random project"? Given the abstraction break that was made 
between the compiler and the linker, it would seem like this was a rather 
critically important goal (as now both the linker and the compiler are less 
modular and more difficult to modify), but in fact the result doesn't manage to 
solve seemingly simple corner cases.

The reason I'm running into these issues is not due to virtual dispatch (at 
least yet: this codebase was C 5 years ago, but is now being ported to C++), 
but instead due to higher-order functions. I'm finding myself in situations 
where std::function and std::bind are disconnecting the symbol references from 
the call sites sufficiently (even moving them to different stacks ;P) to cause 
the linker to make seemingly random decisions.

That said, I can demonstrate a really common idiom, from C (not C++), that is 
almost always going to involve non-split-stack code (as malloc and free are 
normally going to be in libc, compiled without -fsplit-stack), and that is 
morally equivalent to "returning a function pointer and using it later": data 
structures that keep information on a block of dynamically allocated memory and 
"how to free it". Here's a lame version:

struct String {
    const char *data;
    void (*free)(void *);
};

void ClearString(String *string) {
    if (string->data != NULL && string->free != NULL)
        string->free(string->data);

    string->data = NULL;
    string->free = NULL;
}

void SetString(String *string, const char *data, bool alloc) {
    ClearString(string);

    string->data = data;
    string->free = alloc ? &free : NULL;
}

void f(String *string) {
    SetString(string, "hello", true);
    ClearString(string); /* potential stack overflow */
}

(Incidentally, if you use std::vector with a custom allocator that has any kind 
of indirection in it, this is going to come up quite a lot. The code for vector 
instantiated over your allocator will be compiled as part of your code with 
-fsplit-stack, but if the memory allocator being used is something compiled 
without then you are going to end up with a really complex version of the above 
code and a stack overflow.)

[paragraph from above]
> It would certainly be possible for the compiler to arrange to allocate a
> large stack as it called the non-split-stack function.  Unfortunately, I
> don't see how the linker could do it.  And it's the linker, not the
> compiler, that knows that it is a call to a non-split-stack function.

However, the linker doesn't actually have any notion of "calls", which is what 
causes the previous problems. In a language like C++ (or even C) it isn't 
really true that a function that calls another function will go through a 
symbol reference to do it. Anyone who uses code that involves dlsym, 
higher-order functions, or polymorphic object-oriented libraries will run into 
cases that the current -fsplit-stack implementation doesn't even provide good 
(certainly not documented) workarounds for.

Part of me (and I realize that this causes other tradeoffs, and I'm therefore 
not even recommending it: more just musing) feels like the notion of "supports 
split stack" is more of a calling convention. In the same way that gcc 
currently supports regparm, stdcall, thiscall, fastcall... it seems like it 
might simply be a new attribute (probably orthogonal to the calling convention) 
a function can have (and would not have by default): splitcall.

In such an implementation, like many of the existing calling-convention related 
attributes, splitcall would be considered part of the type signature (and 
thereby would not be allowed to be put on a definition and not on the related 
prototype), and could be opted in for a large block of code using a #pragma or 
a compiler-switch. (Again: this is just musing. I haven't put much thought into 
whether this would actually be semantically reasonable yet.)

[see above in this e-mail] For cases where the compiler "simply doesn't know", 
the solution that was brought up for function pointers could be used: have a 
level of indirection in the calls that includes the number of arguments. That 
code could then read the target of the call to see whether the function at the 
other side looked like it supported split-stack, and if not it could allocate 
more stack at the time of that call.

The developer would now be put in the position of thinking about what they are 
calling sometimes (and making certain that their usage of the pragma and header 
files lined up), but honestly I already am having to think about that (due to 
the linker having both false positives and false negatives for all of the above 
reasons, whether the inliner conflating calls or function pointers obfuscating 
them), and I have no explicit mechanism to override it.

In fact, I almost want to say that the worst-case scenario in the "rely on the 
compiler" is the developer throwing up their hands in defeat and attempting to 
recompile "the world" (including libgcc, libsupc++, etc.) with -fsplit-stack... 
but that's where I already am at with the current linker-based implementation: 
the main/only way I'm going to be able to avoid having function pointers to 
non-split-stack code is to recompile every library I need with -fsplit-stack.

> > A specific idea that might help, however, is to set things up so that
> > the PLT actually handles the stack increases when you are linking to
> > functions that are in a dynamic library. That way, calls to open (for
> > example) would not cause the function that called it to suddenly
> > require a large stack, but instead only as control is transferred to
> > open would the stack size increase. (This might be quite complex,
> > though.)
> 
> Yes, again you have to know how many bytes of arguments were pushed on
> the stack.  You can pretty much know this for open, of course, but it's
> a lot more complex for printf (if printf were compiled in split-stack
> mode it would straightforward, but of course in this example it is
> not).
> 
> I agree that this could be a lot nicer.  It's a bit less important for
> Go because obviously the Go compiler is completely in control of all
> functions called by Go code.

In this model (still using the linker, but pushing the stack-split into or 
around the @plt stub function), I would have to propose that variadic functions 
are treated specially (possibly using a similar/identical setup to the one you 
were proposing for function pointers) where the argument count was also passed. 
This could be pushed onto the stack right before the call and popped/thrown off 
the stack first thing in the stub when not needed (which has the benefit of 
being portable between targets and not messing with the existing argument 
placement).

> > That said, I don't have a better solution to suggest right now (I
> > really want to say that having attributes available to declare
> > split-stack functionality in the code would be better, but that has
> > other ramifications), but I do have concerns that due to attempts to
> > keep the ABI fixed decisions made now (when there seem to only be a
> > single major user, Go) will lock in how the mechanism is capable of
> > functioning in the future.
> 
> I may misunderstand your suggestion, but I think that keeping the ABI
> fixed is a requirement.  Any ABI change would require rebuilding all
> libraries and changing the debugger.  The result would not be usable
> for most people.

What I meant by these "concerns" is that it seems like the current mechanism 
for -fsplit-stack is going to get locked into place (and be unable to change 
due to ABI breakage) in a state where even with the abstraction leak between 
the compiler and the linker (which I feel is quite costly) it doesn't really 
solve the problem for many users (and possibly even, any other than Go, which 
might have enough constraints to make this work).

However, this might not actually be that big of a concern, thinking about it 
more. As the existing implementation is, as we both believed, unlikely to be 
incorporated into the default library build, it is really then just a matter 
that -fsplit-stack has to exist with this implementation and Gold needs to 
continue to supporting it; gcc already has tons of ld-specific flags: this 
could just be another one. A later/different implementation of split-stack 
could be -fsplit-stack-ex or something, and existing independently and in 
parallel.

[vaguely in reply to everything above]

Actually, thinking about it more: it seems like 99% of these problems could be 
solved by providing a second symbol definition for the split-stack prologue and 
binding that as part of the type signature. So, you could either call the 
"original implementation" of a function using its normal symbol, or you could 
call the split-stack prologue version of the same function using one that had 
been mangled with some prefix.

extern "C" int test() {
    return 0xdeadbeef;
}

0000000000404920 <test>:
  404920:       64 48 3b 24 25 70 00    cmp    %fs:0x70,%rsp
  404927:       00 00 
  404929:       72 06                   jb     404931 <test+0x11>
  40492b:       b8 ef be ad de          mov    $0xdeadbeef,%eax
  404930:       c3                      retq   
  404931:       45 31 d2                xor    %r10d,%r10d
  404934:       45 31 db                xor    %r11d,%r11d
  404937:       e8 6d 6b 00 00          callq  40b4a9 <__morestack>
  40493c:       c3                      retq   
  40493d:       eb ec                   jmp    40492b <test+0xb>
  40493f:       90                      nop

In this case (and yes: this is an example of a function that shouldn't need 
this prologue at all, but it was short ;P), the existing implementation of 
-fsplit-stack has modified the function to fundamentally check its stack. No 
matter how you attempt to call it, we now have to know whether the function 
supports the split-stack protocol using an out-of-line mechanism, and we cannot 
enforce our beliefs in the compiler: the linker is complete control of this 
decision. However, we could instead have it do this:

0000000000404920 <.split.test>:
  404920:       64 48 3b 24 25 70 00    cmp    %fs:0x70,%rsp
  404927:       00 00 
  404929:       72 06                   jb     404931 <test+0x6>
000000000040492b <test>:
  40492b:       b8 ef be ad de          mov    $0xdeadbeef,%eax
  404930:       c3                      retq   
  404931:       45 31 d2                xor    %r10d,%r10d
  404934:       45 31 db                xor    %r11d,%r11d
  404937:       e8 6d 6b 00 00          callq  40b4a9 <__morestack>
  40493c:       c3                      retq   
  40493d:       eb ec                   jmp    40492b <test>
  40493f:       90                      nop

Now the decision to call either test or .split.test becomes explicit. This 
would allow us to get a linker error if we made an incorrect decision in my 
earlier not-really-a-suggestion-more-of-a-musing of making this knowledge 
explicit in the compiler akin to a calling convention. If the compiler decided 
that something wasn't split-stack, then it would just handle allocating the 
larger stack before the call to the underlying function; or, if it decided the 
function was split-stack, the linker would enforce it, and the user would get a 
reasonable error.

This also has the amazing benefit that it no longer would extract a runtime 
performance cost for libraries such as libc, libsupc++, and libgcc to be 
compiled with -fsplit-stack. The existing people who were using the library 
would continue to call the original version of the code, and only people who 
were trying to opt-in to the split-stack universe would be using the new 
split-stack variations. You could imagine an entire distribution of Linux (or 
whatever) that was compiled with this feature active, and the result would only 
be a slight increase in code-size.

So: thoughts? I really do think that there is a way to do this -fsplit-stack 
feature that allows more people to use it and for it to "change everything" / 
"take over the world" ;P. In my eyes, doing that would require a) that there is 
no impediment to just compiling everything with -fsplit-stack and b) the 
functionality to work with a stock linker (as many platforms are not supported 
by Gold). I think that with some variation on some of my above implementation 
ideas, this feature could be done generically in the compiler (and libgcc) for 
all platforms.

(Obviously, though: this is something I've only been looking at for days, and 
only seriously thinking about the implementation concerns of for hours, so I 
could easily be overlooking something obvious that you thought about two years 
ago that makes any or all of these ideas untenable. I certainly will not be 
bothered to learn that this is all stupid, and in fact will highly appreciate 
the feedback. Again: thank you so much for even reading any of these thoughts 
in the first place. ;P)

Sincerely,
Jay Freeman (saurik)
sau...@saurik.com

Re: random commentary on -fsplit-stack (and a bug report)

Reply via email to