Quoting Stefan Dösinger <ste...@codeweavers.com>:
Here's some code attached that actually works, but is far from perfect. The 'msvc_prologue' attribute is limited to 32 bit. None of the applications that try to place hooks are ported to Win64 yet, so it is impossible to tell what they'll need. Besides, maybe I am lucky that when they appear I can tell their autors to think about Wine. The first problem I (still) have is passing the msvc_prologue attribute around. I abused some other code to do that. How does the attribute handling work, and what would the prefered way be?
Well, you could query the attribute every time you need it, but if that would cause performance issues or significant code bloat, caching information computed from the attributes in the machine specific struct is fine. If you only need a single bit and accesses are not too frequent / often, you can also consider making the machine struct member a char or bitfield, so that it can be effectively stored together with other small struct members.
The 2nd thing I have to figure out is how and when I have to set REG_FRAME_RELATED_EXPR.
It's when there is some operation affecting cfi which is not expressed (in simple enough terms for dwarf2out.c to grok it) in the rtl instruction pattern. Since your nop doesn't affect the call frame or registers, no call frame information needs to be emitted for it. You'll have to also change the third instruction in your 'magic' sequence so that it is or contains an unspec, to prevent it from going walkabout when optimizations like instruction scheduling is performed. If you make it a parallel where the actual oprtation is paired with an empty unspec, no REG_FRAME_RELATED_EXPR is needed. If the actual operation is hidden in the RTL, however, you have to add it in a REG_FRAME_RELATED_EXPR. The latter alternative is more complicated. However, there is a benefit to choosing this: win the stack realign or !frame_pointer_needed cases, the (early) move of esp to ebp is not really supposed to establish a frame pointer, and thus you then don't want any cfi information emitted for it. Thus, you can then simply leave out the REG_FRAME_RELATED_EXPR note.
The msvc_prologue + frame_pointer_needed + !stack_realignment_needed case produces the best possible code. The fp setup from msvc_prologue is used for its purpose. The msvc_prologue + !frame_pointer_needed case could be optimized, as you said. However, that changes all the stack frame offsets and sizes, and I do not quite understand all the code I have to modify for that. I think this should be a separate patch, although arguably be ready before msvc_prologue is added. I personally don't care too much about this combination of parameters(Wine won't need/use it), so this optimization would get lost otherwise.
The code needed shouldn't be large, but if nobody would use it, it wouldn't be tested either, so even if you got it right initially it would be prone to bitrot. So if you'd need extra code for it but nobody would use it, just add a comment in the code and the option documentation that this is an optimization that could be added / is not implemented.
With stack_realignment_needed frame_pointer_needed is on as well, and this code is created(copypasted together by hand, somehow the stack alignment attribute doesn't do anything on my Linux box) movl.s %edi, %edi pushl %ebp movl.s %esp, %ebp pop %ebp lea 0x4(%esp),%ecx and $0xfffffff0,%esp pushl -0x4(%ecx) push %ebp mov %esp,%ebp If we try to get rid of the pop-push-mov, the following things change: *) The value of %ebp
Yes, you have to re-do the move from esp to ebp after stack realignment.
*) The location of the pushed ebp on the stack
That should be fine if you make your prologue expect it there.
*) The alignment of %esp after the whole procedure(its alignment+4 before, and the +4 is lost afterwards)
Alignment +0 is actually better - best would be to make the alignment offset so that no alignment padding is done for the first highly-aligned stack slot, but that would be really a general stack size optimization, i.e. it goes beyond the scope of the current problem, and I don't think you want to go into this right now. Basically, what we do with saving ebp before stack realignment is that we remove the ebp stack slot from the call frame proper. So you just need to say that the size of the saved registers - and thus the total frame size - is 4 bytes less than if the ebp slot was included in the frame. Make sure that the argument pointer still has the right value, and everything should be fine.