Re: [PATCH] Turn on LRA on all targets

Maciej W. Rozycki Fri, 16 Feb 2024 16:38:44 -0800

On Fri, 16 Feb 2024, Maciej W. Rozycki wrote:

> On Fri, 16 Feb 2024, Jakub Jelinek wrote:
> 
> > >  There is no function prologue to optimise in the VAX case, because all 
> > > the frame setup has already been made by the CALLS instruction itself in 
> > > the caller.  The first machine instruction of the callee is technically 
> > > already past the "prologue".  And then RET serves as the whole function 
> > > "epilogue".
> > 
> > So, what is the problem with DWARF unwinding?  Just make sure to emit
> > appropriate instructions describing the saving of the corresponding
> > registers at specific points based on CFA at the start of the function
> > (so that it appears in CIE instructions) and that should be all that is
> > needed, no?
> 
>  I may not remember all the issues correctly offhand as it's been a while 
> since I looked into it, but as I recall DWARF handling code has not been 
> prepared for all the frame to have been already allocated and initialised 
> at a function's entry point, and also at least DWARF-4 is IIRC required to 
> have statics at offsets positive from FP (for a stack growing downwards).


 There is a further complication actually where lazy binding is in use.  
In that case a function that has been jumped to indirectly from the lazy 
resolver will often have a different number of statics saved in the frame 
from where the function has been called directly via a fully resolved PLT 
GOT entry.

 This is because at the time the lazy resolver is being called it is not 
known what statics the ultimate callee wants to save, as it is not a part 
of the ABI.  Therefore the worst condition is assumed and the resolver 
requests all the statics (R6-R11) to be saved, observing that saving more 
statics than required makes no change to code semantics, it just hurts 
performance (but calls to the lazy resolver are rare, so this is not a big 
deal).  Conversely when the function has been already resolved, the PLT 
GOT entry points at the callee instead, which will then only save the 
statics it has requested itself, knowing them to be used.

 Obviously a frame that has all the statics saved will have a different 
size of its variable part and slots will have been assigned differently 
there from the case where only some statics have been saved.  Of course it 
does not matter for regular code execution as RET will always correctly 
interpret a stack frame and restore exactly these statics that have been 
saved in the frame, but for unwinding actual frame contents have to be 
interpreted.

 I am not sure if this run-time dependent frame layout can be described in 
DWARF terms even, so I am leaning towards concluding a native unwinder is 
the only feasible way to go.

 For those who are unaware how information as to what statics are to be 
saved is made available by functions with VAX hardware: it is embedded at 
the function's address in a form of a 16-bit data quantity, which is a 
register save bitmask (an entry mask in VAX-speak) for registers R0-R11;
1 in the mask requests that the corresponding register be saved in the 
callee's frame by the CALLS instruction.  Once the frame has been built by 
CALLS, control is then passed to the location immediately following the 
bitmask, which is the function's actual entry point, i.e. the PC is set 
to the function's address + 2.

  Maciej

Re: [PATCH] Turn on LRA on all targets

Reply via email to