[Lldb-commits] [PATCH] D68270: DWARFDebugLoc: Add a function to get the address range of an entry

David Blaikie via Phabricator via lldb-commits Mon, 07 Oct 2019 18:32:41 -0700

dblaikie added inline comments.


================
Comment at: lib/DebugInfo/DWARF/DWARFDebugLoc.cpp:291-295
+  EntryIterator Absolute =
+      getAbsoluteLocations(
+          SectionedAddress{BaseAddr, SectionedAddress::UndefSection},
+          LookupPooledAddress)
+          .begin();
----------------
labath wrote:
> dblaikie wrote:
> > labath wrote:
> > > This parallel iteration is not completely nice, but I think it's worth 
> > > being able to reuse the absolute range computation code. I'm open to 
> > > ideas for improvement though.
> > Ah, I see - this is what you meant about "In particular it makes it 
> > possible to reuse this stuff in the dumping code, which would have been 
> > pretty hard with callbacks.".
> > 
> > I'm wondering if that might be worth revisiting somewhat. A full iterator 
> > abstraction for one user here (well, two once you include lldb - but I 
> > assume it's likely going to build its own data structure from the iteration 
> > anyway, right? (it's not going to keep the iterator around, do anything 
> > interesting like partial iterations, re-iterate/etc - such that a callback 
> > would suffice))
> > 
> > I could imagine two callback APIs for this - one that gets entries and 
> > locations and one that only gets locations by filtering on the entry 
> > version.
> > 
> > eg:
> > 
> >   // for non-verbose output:
> >   LL.forEachEntry([&](const Entry &E, Expected<DWARFLocation> L) {
> >     if (Verbose && actually dumping debug_loc)
> >       print(E) // print any LLE_*, raw parameters, etc
> >     if (L)
> >       print(*L) // print the resulting address range, section name (if 
> > verbose), 
> >     else
> >       print(error stuff)
> >   });
> > 
> > One question would be "when/where do we print the DWARF expression" - if 
> > there's an error computing the address range, we can still print the 
> > expression, so maybe that happens unconditionally at the end of the 
> > callback, using the expression in the Entry? (then, arguably, the 
> > expression doesn't need to be in the DWARFLocation - and I'd say make the 
> > DWARFLocation a sectioned range, exactly the same type as for ranges so 
> > that part of the dumping code, etc, can be maximally reused)
> Actually, what lldb currently does is that it does not build any data 
> structures at all (except storing the pointer to the right place in the 
> debug_loc section. Then, whenever it wants to do something to the loclist, it 
> parses it afresh. I don't know why it does this exactly, but I assume it has 
> something to do with most locations never being used, or being only a couple 
> of times, and the actual parsing being fairly fast. What this means is that 
> lldb is not really a single "user", but there are like four or five places 
> where it iterates through the list, depending on what does it actually want 
> to do with it. It also does partial iteration where it stops as soon as it 
> find the entry it was interested in.
> Now, all of that is possible with a callback (though I am generally trying to 
> avoid them), but it does resurface the issue of what should be the value of 
> the second argument for DW_LLE_base_address entries (the thing which I 
> originally used a error type for).
> Maybe this should be actually one callback API, taking two callback 
> functions, with one of them being invoked for base_address entries, and one 
> for others? However, if we stick to the current approaches in both LLE and 
> RLE of making the address pool resolution function a parameter (which I'd 
> like to keep, as it makes my job in lldb easier), then this would actually be 
> three callbacks, which starts to get unwieldy. Though one of those callbacks 
> could be removed with the "DWARFUnit implementing a AddrOffsetResolver 
> interface" idea, which I really like. :)
Ah, thanks for the details on LLDB's location parsing logic. That's interesting 
indeed!

I can appreciate an iterator-based API if that's the sort of usage we've got, 
though I expect it doesn't have any interest in the low-level encoding & just 
wants the fully processed address ranges/locations - it doesn't want 
base_address or end_of_list entries? & I think the dual-iteration is a fairly 
awkward API design, trying to iterate them in lock-step, etc. I'd rather avoid 
that if reasonably possible.

Either having an iterator API that gives only the fully processed data/semantic 
view & a completely different API if you want to access the low level 
primitives (LLE, etc) (this is how ranges works - there's an API that gives a 
collection of ranges & abstracts over v4/v5/rnglists/etc - though that's partly 
motivated by a strong multi-client need for that functionality for symbolizing, 
etc - but I think it's a good abstraction/model anyway (& one of the reasons 
the inline range list printing doesn't include encoding information, the API it 
uses is too high level to even have access to it))

> Now, all of that is possible with a callback (though I am generally trying to 
> avoid them), but it does resurface the issue of what should be the value of 
> the second argument for DW_LLE_base_address entries (the thing which I 
> originally used a error type for).

Sorry, my intent in the above API was for the second argument to be Optional's 
"None" state when... oh, I see, I did use Expected there, rather than Optional, 
because there are legit error cases.

I know it's sort of awkward, but I might be inclined to use 
Optional<Expected<AddressRange>> there. I realize two layers of wrapping is a 
bit weird, but I think it'd be nicer than having an error state for what, I 
think, isn't erroneous.

> Maybe this should be actually one callback API, taking two callback 
> functions, with one of them being invoked for base_address entries, and one 
> for others? However, if we stick to the current approaches in both LLE and 
> RLE of making the address pool resolution function a parameter (which I'd 
> like to keep, as it makes my job in lldb easier), then this would actually be 
> three callbacks, which starts to get unwieldy.

Don't mind three callbacks too much.

> Though one of those callbacks could be removed with the "DWARFUnit 
> implementing a AddrOffsetResolver interface" idea, which I really like. :)

Sorry, I haven't really looked at where the address resolver callback is 
registered and alternative designs being discussed - but yeah, going off just 
the one-sentence, it seems reasonable to have the DWARFUnit own an address 
resolver/be the thing you consult when you want to resolve an address (just 
through a normal function call in DWARFUnit, perhaps - which might, internally, 
use a callback registered when it was constructed).


================
Comment at: test/CodeGen/X86/debug-loclists.ll:16
 ; CHECK-NEXT: 0x00000000:
-; CHECK-NEXT:  [0x0000000000000000, 0x0000000000000004): DW_OP_breg5 RDI+0
-; CHECK-NEXT:  [0x0000000000000004, 0x0000000000000012): DW_OP_breg3 RBX+0
-
-; There is no way to use llvm-dwarfdump atm (2018, october) to verify the 
DW_LLE_* codes emited,
-; because dumper is not yet implements that. Use asm code to do this check 
instead.
-;
-; RUN: llc -mtriple=x86_64-pc-linux -filetype=asm < %s -o - | FileCheck %s 
--check-prefix=ASM
-; ASM:      .section .debug_loclists,"",@progbits
-; ASM-NEXT: .long .Ldebug_loclist_table_end0-.Ldebug_loclist_table_start0 # 
Length
-; ASM-NEXT: .Ldebug_loclist_table_start0:
-; ASM-NEXT:  .short 5                              # Version
-; ASM-NEXT:  .byte 8                               # Address size
-; ASM-NEXT:  .byte 0                               # Segment selector size
-; ASM-NEXT:  .long 0                               # Offset entry count
-; ASM-NEXT: .Lloclists_table_base0:                
-; ASM-NEXT: .Ldebug_loc0:
-; ASM-NEXT:  .byte 4                               # DW_LLE_offset_pair
-; ASM-NEXT:  .uleb128 .Lfunc_begin0-.Lfunc_begin0  # starting offset
-; ASM-NEXT:  .uleb128 .Ltmp0-.Lfunc_begin0         # ending offset
-; ASM-NEXT:  .byte 2                               # Loc expr size
-; ASM-NEXT:  .byte 117                             # DW_OP_breg5
-; ASM-NEXT:  .byte 0                               # 0
-; ASM-NEXT:  .byte 4                               # DW_LLE_offset_pair
-; ASM-NEXT:  .uleb128 .Ltmp0-.Lfunc_begin0         # starting offset
-; ASM-NEXT:  .uleb128 .Ltmp1-.Lfunc_begin0         # ending offset
-; ASM-NEXT:  .byte 2                               # Loc expr size
-; ASM-NEXT:  .byte 115                             # DW_OP_breg3
-; ASM-NEXT:  .byte 0                               # 0
-; ASM-NEXT:  .byte 0                               # DW_LLE_end_of_list
-; ASM-NEXT: .Ldebug_loclist_table_end0:
+; CHECK-NEXT:    [DW_LLE_offset_pair ]: 0x0000000000000000, 0x0000000000000004 
=> [0x0000000000000000, 0x0000000000000004) DW_OP_breg5 RDI+0
+; CHECK-NEXT:    [DW_LLE_offset_pair ]: 0x0000000000000004, 0x0000000000000012 
=> [0x0000000000000004, 0x0000000000000012) DW_OP_breg3 RBX+0
----------------
labath wrote:
> dblaikie wrote:
> > labath wrote:
> > > This tries to follow the RLE format as closely as possible, but I think 
> > > something like
> > > ```
> > > [DW_LLE_offset_pair,  0x0000000000000000, 0x0000000000000004] => 
> > > [0x0000000000000000, 0x0000000000000004): DW_OP_breg5 RDI+0
> > > ```
> > > would make more sense (both here and for RLE).
> > Yep, that'd make more sense to me - are you planning to unify the codepaths 
> > for this? I think that'd be for the best.
> > 
> > If I were picking a printing from scratch, I might go with:
> > 
> >   DW_LLE_offset_pair(0x0000, 0x0004) => [0x0000, 0x0004): DW_OP_breg5 RDI+0
> > 
> > Making it look a bit more like a function call and function arguments. 
> > Though the () might be confusing with the range notation. 
> > 
> > I'm also undecided on the " => " separator. Whether a ':' might be 
> > better/fine, etc.
> > 
> > Totally open to ideas, but mostly I'd really love these to use loclist and 
> > ranges to use the same code as much as possible, so we can get consistency 
> > and any readability benefits, etc in both.
> I like the function call format. I hoping to get some code reuse, though it's 
> still not fully clear to me how to achieve that..
I've posted my unification of range/loc/v4/v5 emission here: 
https://reviews.llvm.org/D68620 - & I'd imagine something similar in the 
parsing side.


================
Comment at: test/DebugInfo/X86/dwarfdump-debug-loclists.test:7
 # CHECK:       DW_AT_location [DW_FORM_sec_offset]   (0x0000000c
-# CHECK-NEXT:    [0x0000000000000010, 0x0000000000000020): DW_OP_breg5 RDI+0
-# CHECK-NEXT:    [0x0000000000000530, 0x0000000000000540): DW_OP_breg6 RBP-8, 
DW_OP_deref
-# CHECK-NEXT:    [0x0000000000000700, 0x0000000000000710): DW_OP_breg5 RDI+0
+# CHECK-NEXT:    [DW_LLE_offset_pair  ]: 0x0000000000000000, 
0x0000000000000010 => [0x0000000000000010, 0x0000000000000020) DW_OP_breg5 RDI+0
+# CHECK-NEXT:    [DW_LLE_base_address ]: 0x0000000000000500
----------------
labath wrote:
> dblaikie wrote:
> > I don't think the inline dumping should print the encoding - I'd borrow a 
> > lot from/try to unify with the ranges printing, which doesn't. I think 
> > verbose ranges print the same as non-verbose except they also add the 
> > section name/number.
> Sure, I can do that, though I think that means there won't be a single place 
> where one can see both the raw encodings and their interpretation -- 
> section-based dumping will not show the interpretation (would you want me to 
> show still show them I they happen to be interpretable without the base 
> address or the address pool?), and the debug_info dumping will not show the 
> encoding. Is that bad? -- I don't know...
Fair - that comes back to the issue I mentioned in a previous comment about 
potentially limiting dumping of non-debug_info sections based on the presence 
of a CU that references it (& only dumping it that way, rather than trying to 
parse it without a CU). DWARF isn't really designed to be parsed without the CU 
anyway. (could leave it in as best-effort to parse things without a referencing 
CU for debugging, etc).

Mostly I'm interested in unification perhaps more/primarily, than feature 
improvements - then we can make feature improvements to both ranges and locs 
without having to duplicate things.


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68270/new/

https://reviews.llvm.org/D68270



_______________________________________________
lldb-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [PATCH] D68270: DWARFDebugLoc: Add a function to get the address range of an entry

Reply via email to