Quoting Stefan Dösinger <ste...@codeweavers.com>:

Here's some code attached that actually works, but is far from perfect.

The 'msvc_prologue' attribute is limited to 32 bit. None of the applications
that try to place hooks are ported to Win64 yet, so it is impossible to tell
what they'll need. Besides, maybe I am lucky that when they appear I can
tell their autors to think about Wine.

The first problem I (still) have is passing the msvc_prologue attribute
around. I abused some other code to do that. How does the attribute handling
work, and what would the prefered way be?

Well, you could query the attribute every time you need it, but if that would
cause performance issues or significant code bloat, caching information
computed from the attributes in the machine specific struct is fine.
If you only need a single bit and accesses are not too frequent / often,
you can also consider making the machine struct member a char or bitfield,
so that it can be effectively stored together with other small struct members.

The 2nd thing I have to figure out is how and when I have to set
REG_FRAME_RELATED_EXPR.

It's when there is some operation affecting cfi which is not expressed (in
simple enough terms for dwarf2out.c to grok it) in the rtl instruction
pattern.  Since your nop doesn't affect the call frame or registers, no
call frame information needs to be emitted for it.

You'll have to also change the third instruction in your 'magic' sequence
so that it is or contains an unspec, to prevent it from going walkabout
when optimizations like instruction scheduling is performed.
If you make it a parallel where the actual oprtation is paired with an
empty unspec, no REG_FRAME_RELATED_EXPR is needed.  If the actual operation
is hidden in the RTL, however, you have to add it in a REG_FRAME_RELATED_EXPR.
The latter alternative is more complicated.  However, there is a benefit to
choosing this: win the stack realign or !frame_pointer_needed cases, the
(early) move of esp to ebp is not really supposed to establish a frame
pointer, and thus you then don't want any cfi information emitted for it.
Thus, you can then simply leave out the REG_FRAME_RELATED_EXPR note.


The msvc_prologue + frame_pointer_needed + !stack_realignment_needed case
produces the best possible code. The fp setup from msvc_prologue is used for
its purpose.

The msvc_prologue + !frame_pointer_needed case could be optimized, as you
said. However, that changes all the stack frame offsets and sizes, and I do
not quite understand all the code I have to modify for that. I think this
should be a separate patch, although arguably be ready before msvc_prologue
is added. I personally don't care too much about this combination of
parameters(Wine won't need/use it), so this optimization would get lost
otherwise.

The code needed shouldn't be large, but if nobody would use it, it wouldn't
be tested either, so even if you got it right initially it would be prone
to bitrot.  So if you'd need extra code for it but nobody would use it, just
add a comment in the code and the option documentation that this is an
optimization that could be added / is not implemented.

With stack_realignment_needed frame_pointer_needed is on as well, and this
code is created(copypasted together by hand, somehow the stack alignment
attribute doesn't do anything on my Linux box)

movl.s %edi, %edi
pushl  %ebp
movl.s %esp, %ebp
pop    %ebp
lea    0x4(%esp),%ecx
and    $0xfffffff0,%esp
pushl  -0x4(%ecx)
push   %ebp
mov    %esp,%ebp

If we try to get rid of the pop-push-mov, the following things change:

*) The value of %ebp

Yes, you have to re-do the move from esp to ebp after stack realignment.

*) The location of the pushed ebp on the stack

That should be fine if you make your prologue expect it there.

*) The alignment of %esp after the whole procedure(its alignment+4 before,
and the +4 is lost afterwards)

Alignment +0 is actually better - best would be to make the alignment offset
so that no alignment padding is done for the first highly-aligned stack slot,
but that would be really a general stack size optimization, i.e. it goes
beyond the scope of the current problem, and I don't think you want to
go into this right now.
Basically, what we do with saving ebp before stack realignment is that we
remove the ebp stack slot from the call frame proper.
So you just need to say that the size of the saved registers - and thus
the total frame size - is  4 bytes less than if the ebp slot was included
in the frame.  Make sure that the argument pointer still has the right value,
and everything should be fine.

Reply via email to