Geoffrey Keating <[EMAIL PROTECTED]> writes:

> The problem is that the code ignores all instructions in the
> prologue.  It happens, in eh1.C, that a stack adjustment (to enforce
> stack alignment) for the call is combined with a different stack
> adjustment (to allocate a local variable) in the prologue, by the
> 'csa' phase.  There's no way to get this right without knowing that
> this is happening, and as far as I can see there's no information
> available that it happened.  The final code looks like:
> 
>          pushl   %ebp
>          movl    %esp, %ebp
>          pushl   %ebx
>          subl    $16, %esp            # the stack push that is combined
>          call    L6
> "L00000000001$pb":
> L6:
>          popl    %ebx                          # last instruction in prolog
>          pushl   $4
>          call    L___cxa_allocate_exception$stub              # this
> routine has  16 bytes of arguments
>          movl    $99, (%eax)
>          addl    $12, %esp            # stack adjust preparing for next call
>          pushl   $0
>          pushl   L__ZTIi$non_lazy_ptr-"L00000000001$pb"(%ebx)
>          pushl   %eax
>          call    L___cxa_throw$stub
> 
> The EH information says that __cxa_allocate_exception has 4 bytes of
> args, and __cxa_throw has 12 bytes of args; but really they both have
> 16 bytes of args.
> 
> The effect of all this on Darwin is that the stack becomes misaligned
> and then the dynamic loader crashes when it tries to use SSE (or
> something).
> 
> I think possible solutions are:
> 
> 1. Revert this patch, and note that the original bug (which you
> mention you couldn't reproduce on 4.0+) might not be fixed.
> 2. Remove the whole routine and declare that if you want to use EH,
> you must not have ACCUMULATE_OUTGOING_ARGS set.
> 3. Try to patch around it by looking at the argument size in the
> call, and if it's greater than the apparent argument size, using the
> value in the call instead.
> 4. Make the routine much more intelligent, or ideally have the proper
> information attached to the call at the time it's generated (as a
> note?), and just use that.  Add gcc_assert calls to verify that (a)
> the argument size never becomes negative and (b) the argument size is
> never less than the size of each particular call.
> 
> I prefer (1).  Next best would be (4), but it's likely to break some
> ports.  (3) doesn't sound very good, and (2) removes a feature.
> 
> Any other suggestions?

We could change CSA to not combine a prologue instruction with a
non-prologue instruction.  Although that would remove a (minor)
optimization.

We could change CSA so that when it combines a prologue instruction
with a non-prologue instruction it resets the RTX_FRAME_RELATED flag.
That probably wouldn't work.

We could change CSA so that when it combines a prologue instruction
with a non-prologue instruction it sets a new flag on the instruction,
and uses a table on the side to record the original values in the
instruction.

We could avoid nesting memcpy calls on ACCUMULATE_OUTGOING_ARGS
machines, in which case I think Alex's patch is unnecessary.

I'm not sure about your option 1--Alex didn't say that he couldn't
reproduce the bug in mainline, he said he didn't have a test case for
the specific case of memcpy popping the arguments off the stack on
return.

Ian

Reply via email to