Re: [Dwarf-Discuss] address pool + offset representation
Ping - any ideas? On Wed, May 17, 2017 at 4:17 PM David Blaikie wrote: > A big part of Fission debug info in object files an optimized build, are > unique address relocation in debug_addr and debug_ranges. I have an example > binary where for N bytes of .text, there are ~2N bytes of .rela.debug_addr > and >2N bytes of .rela.debug_ranges. > > Given that .rela.debug_line is about ~1% of the size of .text - that gives > a sense of the lower bound - there should be only a few more relocations > needed in .debug_addr than in .debug_line (relocs for global variables). > > This arises because things like low_pc (for each subprogram and then each > lexical block inside it) use distinct addresses in the address pool, when > they could use an offset relative to some other known address in the pool > (basically one address for each .text section - and everything would use > that + offset). > > The new *x (startx*, base_addressx) forms in the debug_rnglists format > address the relocations for the debug_ranges section - allowing it to reuse > addresses in debug_addr. > > But to address debug_addr's redundancy, it'd would be nice to have an > abbreviation form to represent "address in the pool, plus a constant > offset". > > What form should this take? > > Currently there's addrx{,1-4} - a LEB128, or 1-4 byte fixed-length > representation. > Similarly the offset between addresses could be a variety of lengths. > high_pc for example allows for any form of the constant class. > > Should this support the combination of all of these forms > (addrx{,1-4}*data{,1-4}). Or is there some better option? > > I hope to try prototyping this as a GNU extension form if/when people have > some idea of what might be best for the representation. > > > Bonus points: It'd be great to move the address pool into the line table - > then it'd be really one relocation per section. That'd need a new assembly > directive, though. Anyone got ideas on that too? (this would be orthogonal > to the above improvement - both things are good to do) > > ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
Re: [Dwarf-Discuss] Including the DWO name in the CU hash
Ping? I did end up hacking around this by hashing in the DWO name into the CU hash if LLVM's producing more than one CU. It's not perfect (really it's more about the ThinLTO importing stage - normal LTO doesn't need this sort of mangling) but suffices for now: http://llvm.org/viewvc/llvm-project?rev=304119&view=rev But it seems to suffice for now. On Fri, May 19, 2017 at 3:51 PM David Blaikie wrote: > some context: > > 1) A little while ago, I added the dwo_name to the dwo CU to improve > diagnostic quality on CU hash collisions in during a dwp action (previously > it could only report which input files (possibly other DWP files) contained > the duplicates/collision which could be very manual to track back to the > original input DWO files - having the original DWO names in the diagnostic > made it relatively easy to track down). > > 2) LLVM's new ThinLTO presents a high chance of duplicate DWO CUs - it > does this by creating effectively "new" CUs containing a stripped down > version of an existing CU - containing only a handful of functions that may > be relevant to optimizing some other CU. (imagine two CUs both using a > single inline function from a 3rd CU - the 3rd CU's inline function and the > basic CU itself is imported into the compilation steps of the other two CUs > - so in the end you get two DWO files, each with two CUs, where one CU > contains only an abstract definition of the inline function). > > My initial thinking here was that I could cross-pollinate the CU hash from > each CU within a single compilation, since the primary CU would have enough > uniqueness (hash all the CUs, then cross-hash them). > > But then I realized the CUs should already be unique because they include > the dwo_name which will be different between the two stripped down CU > clones. But the dwo_name isn't included in the hash - so I prototyped > including it & it does what you'd expect. > > Extra wrinkle: Once the dwo_name is in the hash, then it defeats my > original motivation for including it in the DWO CU in the first place: such > CUs will never collide, so the name would never be useful for diagnostic > quality. > > Should I drop the dwo_name from the DWO CU and manually/explicitly include > it in the hash? Does cross pollination sound better? Should I only do > either of these when dealing with more than one CU in a DWO? (in which case > the diagnostic improvement would still be valid - it catches some > interesting cases, but they're not /very/ interesting like major bugs (& > does DWO ID collisions have some false positives too, which hashing the > dwo_id would fix), etc... and the mechanism wasn't built for bug catching > in any case) > ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
Re: [Dwarf-Discuss] Fission + cross-CU references (ref_addr)
Pinging this thread & bringing it to the attention of Cary & Doug in case they've got some thoughts. I'd love to have some kind of ideas about this to work on an implementation in GDB (as an extension/prototype). (don't mean to dismiss Paul's suggestion - and I generally agree with it, except where it gets a bit difficult when type units are folded into debug_info (Paul's suggestion requiring some kind of grouping within a section ("contributions") which seems like a new constraint/detail I wouldn't think to have here)) On another thread Cary talked about the idea of avoiding ref_addr and expanding the functionality of DW_FORM_sig8/signature and perhaps generalizing the type unit header to allow multiple hash+offset pairs. This is a feature Adrian & I had talked about previously (when looking at the overhead of type units - the need to duplicate member function declarations in any unit that has a definition of the member function, since the specific DIE in the type unit couldn't be referred to from the definition in the CU) & I think would be great (& would further generalize the unit header - no need for a difference between TU and CU header, instead have a "count" & then that many {hash, offset} pairs for referring to any DIEs from elsewhere). Might not be the perfect solution to the CU->TU references, since signatures are large-ish. So might require a bit more care/data/etc. I'm also not sure/don't think I'd have the time to push that ^ feature, as much as it is a nice one. Paul's cu_index column extension wouldn't be too hard to implement, I think. On Tue, May 2, 2017 at 12:09 PM David Blaikie wrote: > I've recently been trying to resolve the use of Fission in LLVM's ThinLTO > mode (though this would apply to plain LTO too). > > One of the things that happens here is that cross-CU DIE references > (DW_FORM_ref_addr) are used to describe inlining a function in one CU into > another CU. > > This format has been implemented in LLVM and GCC for ~years and seems to > work well outside of Fission. > > So the question is: what to do with Fission? > > It seemed to me that a good representation would be to produce multiple > CUs into a single DWO file, which GDB can't yet consume, but I'm working on > patches to help there. DW_FORM_ref_addr would not use any ELF relocation, > but be assumed to be "relative to the chunk of debug_info it was in" > (within the .dwo file) > > But what about DWP files? Currently binutils dwp produces records like > this: > > (this dwp contains 3 CUs, two from one LTO compile, and one from a > standalone compile linked in for comparison): > > Index Signature INFO ABBR LINE STR_OFF > - -- > 2 0x7bd765349b7e7631 [2d, 65) [38, ae) [11, 22) [14, 3c) > 8 0x66f4e160661d2687 [00, 2d) [00, 38) [00, 11) [00, 14) >11 0x32dd6d7121dd1d9a [65, 98) [38, ae) [11, 22) [14, 3c) > > So the ABBR/LINE/STR_OFF sections are kept as-is (no analysis is done to > find which portions of the dwo file are used by which CUs, etc), but the > INFO section is fragmented on the CU boundaries. Fragmenting the TYPES > section on the TU boundaries is necessary/useful for deduplication of > types, but this fragmenting of the CU makes it impossible (I think) to use > ref_addr in a dwp file. > > If this fragmenting were not done - consumers (GDB, etc) would need to > change to account for this - searching through the INFO range to find the > CU matching the signature, rather than knowing it starts at the start of > the INFO range. This could have a noticeable performance impact especially > in a full LTO build (where /all/ the CUs were in the same .dwo - so the > index would be entirely unhelpful, I think). > > Does all this sound right/sane - anyone have ideas/perspectives/thoughts > on how this should work? > > ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org