Re: [lldb-dev] Accessing DWARF information from C++

2015-10-30 Thread Stefan Kratochwil via lldb-dev

Hi Greg,

thanks for your reply.

I am developing a dynamic software updating tool for dynamically linked 
C libraries (in short: I want to patch dynamic libraries in-memory).


My goal is to achieve that without any code instrumentation. The main 
problem is the state transformation between the old and the new version 
- here I need as much type (and location) information as possible.


I am currently unaware of what types of information I _really_ need, so 
for now, getting as much information as possible is my way to go. I 
guess that the most important types of information will be:

- size of primitives
- size of pointers
- type hierarchies and
- the in memory-structure of compound data types

At the moment I am concentrating on the Linux / UNIX domain, so DWARF is 
my preferred format.


Cheers,
Stefan



On 10/14/2015 12:15 AM, Greg Clayton wrote:



On Oct 13, 2015, at 2:42 AM, Stefan Kratochwil via lldb-dev 
 wrote:

Hi altogether,

I currently am developing an application where I need to access the DWARF 
debugging information of my target process and its loaded .so files.

In more detail, I need to match type information of certain entities within my 
code and its dynamically linked libraries.


I already use the lldb scripting bridge in my application, hence I would like 
to use lldb's DWARF parsing capabilities in my application, too.

Now, there is no (obvious) way to extract DIEs using the C++ API, so I need a 
few hints where to start. Does anyone have a minimal example for simply dumping 
DWARF info in a 'readelf -w' manner?


We don't dump DWARF unless it is done by enabling DWARF logging, but that won't 
help you. So if you want to just dump dwarf, use lldb-dwarfdump.


I further  discovered another DWARF implementation within the llvm sources. 
After some investigation I found this discussion on lldb-dev:
http://lists.cs.uiuc.edu/pipermail/lldb-dev/2014-June/004197.html

Does anybody know if there is already some effort made to implement a lldwarf 
solution as a replacement for both mentioned implementations?


LLDB provides a debug info agnostic abstraction. If you want actual raw DWARF, 
then you should use llvm's DWARF parser. llvm's DWARF parser was created by 
copying the LLDB version and trimming it down. It was initially done to get to 
file and line info in the .debug_info, but it has been expanded since.


And, after going through the discussion, is it probably better for me to use 
the llvm fork?


So this depends on how DWARF specific you need things. With LLDB, you can 
lookup types by name and get a SBType back that can give you all of the info 
that DWARF will give you. You can get SBFunction objects that describe 
functions, you can lookup an address and get a SBSymbolContext which will give 
you access to the SBModule (object file), SBCompileUnit, SBFunction, SBBlock, 
SBLineEntry and SBSymbol. So you will be able to use LLDB to do specific 
queries, but we won't give you exact DWARF access. So if you use LLDB depends 
on what you are looking to get from the debug info. Can you clarify what kinds 
of queries you will be making? I might be able to tell you if you will be able 
to do them with our public API.

Greg



___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] FW: LLDB: Unwinding based on Assembly Instruction Profiling

2015-10-30 Thread Abhishek Aggarwal via lldb-dev
Hi Jason

Thanks a lot for the detailed information. I am sorry to post my
queries a bit late. Here are few things that I want to ask.

When eh_frame has epilogue description as well, the Assembly profiler
doesn't need to augment it. In this case, is eh_frame augmented unwind
plan used as Non Call Site Unwind Plan or Assembly based Unwind Plan
is used? I checked FuncUnwinders::GetUnwindPlanAtNonCallSite()
function. When there is nothing to augment in eh_frame Unwind plan,
then GetEHFrameAugmentedUnwindPlan() function returns nullptr and
AssemblyUnwindPlan is used as Non Call Site Unwind Plan. Is it the
expected behavior?

About your comments on gcc producing ''asynchronous unwind tables'',
do you mean that gcc is not producing asynchronous unwind tables as it
keeps *some* async unwind instructions and not all of them?

Abhishek


> Hi all, sorry I missed this discussion last week, I was a little busy.
>
> Greg's original statement isn't correct -- about a year ago Tong Shen changed 
> lldb to using eh_frame for the currently-executing frame.  While it is true 
> that eh_frame is not guaranteed to describe the prologue/epilogue, in 
> practice eh_frame always describes the epilogue (gdb wouldn't couldn't 
> without this, with its much more simplistic unwinder).  Newer gcc's also 
> describe the epilogue.  clang does not (currently) describe the epilogue.  
> Tong's changes *augment* the eh_frame with an epilogue description if it 
> doesn't already have one.
>
> gcc does have an "asynchronous unwind tables" option -- "asynchronous" 
> meaning the unwind rules are defined at every instruction location.  But the 
> last time I tried it, it did nothing.  They've settled on an unfortunate 
> middle ground where eh_frame (which should be compact and only describe 
> enough for exception handling) has *some* async unwind instructions.  And the 
> same unwind rules are emitted into the debug_frame section, even if 
> -fasynchronous-unwind-tables is used.
>
> In the ideal world, eh_frame should be extremely compact and only sufficient 
> for exception handling.  debug_frame should be extremely verbose and describe 
> the unwind rules at all unwind locations.
>
> As Tamas says, there's no indication in eh_frame or debug_frame as to how 
> much is described:  call-sites only (for exception handling), call-sites + 
> prologue, call-sites + prologue + epilogue, or fully asynchronous.  It's a 
> drag, if the DWARF committee ever has enough reason to break open the 
> debug_frame format for some other changes, I'd like to get more information 
> in there.
>
>
> Anyway, point is, we're living off of eh_frame (possibly "augmented") for the 
> currently-executing stack frame these days.  lldb may avoid using the 
> assembly unwinder altogether in an environment where it finds eh_frame unwind 
> instructions for every stack frame.
>
>
> (on Mac, we've switched to a format called "compact unwind" -- much like the 
> ARM unwind info that Tamas recently added support for, this is an extremely 
> small bit of information which describes one unwind rule for the entire 
> function.  It is only applicable or exception handling, it has no way to 
> describe prologues/epilogues.  compact unwind is two 4-byte words per 
> function.  lldb will use compact unwind / ARM unwind info for the non-zeroth 
> stack frames.  It will use its assembly instruction profiler for the 
> currently-executing stack frame.)
>
> Hope that helps.
>
> J
>
>
>> On Oct 15, 2015, at 2:56 AM, Tamas Berghammer via lldb-dev 
>>  wrote:
>>
>> If we are trying to unwind from a non call site (frame 0 or signal handler) 
>> then the current implementation first try to use the non call site unwind 
>> plan (usually assembly emulation) and if that one fails then it will fall 
>> back to the call site unwind plan (eh_frame, compact unwind info, etc.) 
>> instead of falling back to the architecture default unwind plan because it 
>> should be a better guess in general and we usually fail with the assembly 
>> emulation based unwind plan for hand written assembly functions where 
>> eh_frame is usually valid at all address.
>>
>> Generating asynchronous eh_frame (valid at all address) is possible with gcc 
>> (I am not sure about clang) but there is no way to tell if a given eh_frame 
>> inside an object file is valid at all address or only at call sites. The 
>> best approximation what we can do is to say that each eh_frame entry is 
>> valid only at the address what it specifies as start address but we don't 
>> make a use of it in LLDB at the moment.
>>
>> For the 2nd part of the original question, I think changing the eh_frame 
>> based unwind plan after a failed unwind using instruction emulation is only 
>> a valid option for the PC where we tried to unwind from because the assembly 
>> based unwind plan could be valid at other parts of the function. Making the 
>> change for that 1 concrete PC address would make sense, but have practically 
>> no effect because the

[lldb-dev] [Bug 23560] lldb does not understand gcc-4.9 function prologues

2015-10-30 Thread via lldb-dev
https://llvm.org/bugs/show_bug.cgi?id=23560

abhiinnit...@gmail.com changed:

   What|Removed |Added

 CC||abhiinnit...@gmail.com
   Assignee|lldb-dev@lists.llvm.org |abhiinnit...@gmail.com

--- Comment #1 from abhiinnit...@gmail.com ---
I'm working on it. I will submit a fix soon.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[lldb-dev] [Bug 23139] Some tests fail with gcc-4.9.2 on Linux

2015-10-30 Thread via lldb-dev
https://llvm.org/bugs/show_bug.cgi?id=23139

abhiinnit...@gmail.com changed:

   What|Removed |Added

 CC||abhiinnit...@gmail.com
   Assignee|lldb-dev@lists.llvm.org |abhiinnit...@gmail.com

--- Comment #1 from abhiinnit...@gmail.com ---
I am working on it. Hope to submit a fix for it.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[lldb-dev] [Bug 23139] Some tests fail with gcc-4.9.2 on Linux

2015-10-30 Thread via lldb-dev
https://llvm.org/bugs/show_bug.cgi?id=23139

abhiinnit...@gmail.com changed:

   What|Removed |Added

 CC||lldb-dev@lists.llvm.org

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[lldb-dev] [Bug 23560] lldb does not understand gcc-4.9 function prologues

2015-10-30 Thread via lldb-dev
https://llvm.org/bugs/show_bug.cgi?id=23560

abhiinnit...@gmail.com changed:

   What|Removed |Added

 CC||lldb-dev@lists.llvm.org

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] LLDB: Unwinding based on Assembly Instruction Profiling

2015-10-30 Thread Jason Molenda via lldb-dev
Hi Abhishek,


> On Oct 30, 2015, at 6:56 AM, Abhishek Aggarwal  wrote:
> 
> When eh_frame has epilogue description as well, the Assembly profiler
> doesn't need to augment it. In this case, is eh_frame augmented unwind
> plan used as Non Call Site Unwind Plan or Assembly based Unwind Plan
> is used?

Yes, you're correct.

If an eh_frame unwind plan describes the epilogue and the prologue, we will use 
it at "non-call sites", that is, the currently executing function.  

If we augment an eh_frame unwind plan by adding epilogue instructions, we will 
use it at non-call sites.

If an eh_frame unwind plan is missing epilogue, and we can't augment it for 
some reason, then it will not be used at non-call sites (the currently 
executing function).

The assembly unwind plan will be used for the currently executing function if 
we can't use the eh_frame unwind plan.



> I checked FuncUnwinders::GetUnwindPlanAtNonCallSite()
> function. When there is nothing to augment in eh_frame Unwind plan,
> then GetEHFrameAugmentedUnwindPlan() function returns nullptr and
> AssemblyUnwindPlan is used as Non Call Site Unwind Plan. Is it the
> expected behavior?


Yes.  FuncUnwinders::GetEHFrameAugmentedUnwindPlan gets the plain eh_frame 
unwind plan, passes it to UnwindAssembly_x86::AugmentUnwindPlanFromCallSite().

UnwindAssembly_x86::AugmentUnwindPlanFromCallSite will verify that the unwind 
plan describes the prologue.  If the prologue isn't described, it says that 
this cannot be augmented.

It then looks to see if the epilogue is described.  If the epilogue is 
described, it says the unwind plan is usable as-is.

If the epilogue is not described, it will use the assembly unwinder to add the 
epilogue unwind instructions.

> 
> About your comments on gcc producing ''asynchronous unwind tables'',
> do you mean that gcc is not producing asynchronous unwind tables as it
> keeps *some* async unwind instructions and not all of them?


"asynchronous" means that the unwind instructions are valid at every 
instruction location.

"synchronous" means that the unwind instructions are only valid at places where 
an exception can be thrown, or a function is called that may throw an exception.


Inside lldb, I use the terminology "non-call site" to mean "asynchronous".  
You're at an arbitrary instruction location, for instance, you're in the 
currently-executing function.  I use "call site" to mean synchronous - a 
function has called another function, so it's in the middle of the function 
body, past the prologue, before the epilogue.  This is a function higher up on 
the stack.

The terms are confusing, I know.

The last time I checked, gcc cannot be made to emit truly asynchronous unwind 
instructions.  This is easy to test on a i386 binary compiled with 
-fomit-frame-pointer.  For instance (the details will be a little different on 
an ELF system but I bet it will be similar if the program runs position 
independent aka pic):

% cat >test.c
#include 
int main () { puts ("HI"); }
^D
% clang  -arch i386 -fomit-frame-pointer test.c
% lldb a.out
(lldb) target create "a.out"
Current executable set to 'a.out' (i386).(lldb) disass -b -n main
a.out`main:
a.out[0x1f70] <+0>:  83 ec 0c   subl   $0xc, %esp
a.out[0x1f73] <+3>:  e8 00 00 00 00 calll  0x1f78; <+8>
a.out[0x1f78] <+8>:  58 popl   %eax
a.out[0x1f79] <+9>:  8d 80 3a 00 00 00  leal   0x3a(%eax), %eax
a.out[0x1f7f] <+15>: 89 04 24   movl   %eax, (%esp)
a.out[0x1f82] <+18>: e8 0d 00 00 00 calll  0x1f94; 
symbol stub for: puts

Look at the call instruction at +3.  What is this doing?  It calls the next 
instruction, which does a pop %eax. This is loading the address main+8 into eax 
so it can get the address of the "HI" string which is at main+8+0x3a.  It's 
called a "pic base", or position independent code base, because this program 
could be loaded at any address when it is run, the instructions can't directly 
reference the address of the "HI" string.

If I run this program and have lldb dump its assembly unwind rules for the 
function:

(lldb) image show-unwind -n main
row[0]:0: CFA=esp +4 => esp=CFA+0 eip=[CFA-4] 
row[1]:3: CFA=esp+16 => esp=CFA+0 eip=[CFA-4] 
row[2]:8: CFA=esp+20 => esp=CFA+0 eip=[CFA-4] 
row[3]:9: CFA=esp+16 => esp=CFA+0 eip=[CFA-4] 
row[4]:   34: CFA=esp +4 => esp=CFA+0 eip=[CFA-4] 

It gets this right.  After the call instruction at +3, the CFA is now esp+20 
because we just pushed a word on to the tack.  And after the pop instruction at 
+8, the CFA is back to esp+16 because we popped that word off the stack.

An asynchronous unwind plan would describe these stack movements.  A 
synchronous unwind plan will not -- they are before any point where we could 
throw an exception, or before we call another function.

(notice that you need to use -fomit-frame-pointer to get this problem.  If ebp 
is set up as the frame pointer, it doesn't matter how we change the stack 
po