Geoffrey Keating <[EMAIL PROTECTED]> writes: > The problem is that the code ignores all instructions in the > prologue. It happens, in eh1.C, that a stack adjustment (to enforce > stack alignment) for the call is combined with a different stack > adjustment (to allocate a local variable) in the prologue, by the > 'csa' phase. There's no way to get this right without knowing that > this is happening, and as far as I can see there's no information > available that it happened. The final code looks like: > > pushl %ebp > movl %esp, %ebp > pushl %ebx > subl $16, %esp # the stack push that is combined > call L6 > "L00000000001$pb": > L6: > popl %ebx # last instruction in prolog > pushl $4 > call L___cxa_allocate_exception$stub # this > routine has 16 bytes of arguments > movl $99, (%eax) > addl $12, %esp # stack adjust preparing for next call > pushl $0 > pushl L__ZTIi$non_lazy_ptr-"L00000000001$pb"(%ebx) > pushl %eax > call L___cxa_throw$stub > > The EH information says that __cxa_allocate_exception has 4 bytes of > args, and __cxa_throw has 12 bytes of args; but really they both have > 16 bytes of args. > > The effect of all this on Darwin is that the stack becomes misaligned > and then the dynamic loader crashes when it tries to use SSE (or > something). > > I think possible solutions are: > > 1. Revert this patch, and note that the original bug (which you > mention you couldn't reproduce on 4.0+) might not be fixed. > 2. Remove the whole routine and declare that if you want to use EH, > you must not have ACCUMULATE_OUTGOING_ARGS set. > 3. Try to patch around it by looking at the argument size in the > call, and if it's greater than the apparent argument size, using the > value in the call instead. > 4. Make the routine much more intelligent, or ideally have the proper > information attached to the call at the time it's generated (as a > note?), and just use that. Add gcc_assert calls to verify that (a) > the argument size never becomes negative and (b) the argument size is > never less than the size of each particular call. > > I prefer (1). Next best would be (4), but it's likely to break some > ports. (3) doesn't sound very good, and (2) removes a feature. > > Any other suggestions?
We could change CSA to not combine a prologue instruction with a non-prologue instruction. Although that would remove a (minor) optimization. We could change CSA so that when it combines a prologue instruction with a non-prologue instruction it resets the RTX_FRAME_RELATED flag. That probably wouldn't work. We could change CSA so that when it combines a prologue instruction with a non-prologue instruction it sets a new flag on the instruction, and uses a table on the side to record the original values in the instruction. We could avoid nesting memcpy calls on ACCUMULATE_OUTGOING_ARGS machines, in which case I think Alex's patch is unnecessary. I'm not sure about your option 1--Alex didn't say that he couldn't reproduce the bug in mainline, he said he didn't have a test case for the specific case of memcpy popping the arguments off the stack on return. Ian