On Tue, Sep 15, 2020 at 10:13 AM Robinson, Paul via Dwarf-Discuss <dwarf-discuss@lists.dwarfstd.org> wrote: > > David Blaikie has brought this up with me (or in conversations that > I observed) a couple of times:
Thanks for bringing this up! Not sure if I've raised this on dwarf-discuss specifically before.. ah, yeah, 3 years ago: http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2017-June/004378.html http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2017-July/thread.html#4380 http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2017-August/004393.html Most recently I had an idea for a workaround that I proposed on the llvm-dev mailing list: https://groups.google.com/g/llvm-dev/c/g3eGxhi4ATU/m/fbrBPFxNBwAJ The idea being that actually using debug_rnglists even for contiguous ranges would reduce .o/executable file size when using Split DWARF. I think the data I had even showed breakeven for non-split DWARF object files, probably slight growth for linked executables in that case, though. > It's common to want to refer to a particular address plus an offset, > for example for DW_AT_low_pc or DW_AT_ranges to describe a lexical > block or inlined subprogram within another subprogram. Yep - the ones I'm especially interested in now, are those that won't be addressed even by a "ranges everywhere" approach (though that approach does have size tradeoffs that I'd like to avoid/improve on too, for sure!) - DW_TAG_call_site's DW_AT_call_pc/DW_AT_call_return_pc and DW_TAG_label's DW_AT_low_pc. The latter isn't super common in code I'm dealing with, but the former is pretty ubiquitous now. > Generally > the only symbolic address available is the entry point of the > containing subprogram. Back when addresses were held directly in > the .debug_info section, the attributes would have relocations, the > offset would be encoded into the relocation and the linker would > just do the right thing. > > With DWARF v5, we now have the .debug_addr section, which contains > the addresses to be fixed up by the linker. But, we don't have a > way to specify an offset to add to an entry in the .debug_addr > section; instead, each unique addr+offset requires its own entry > in the .debug_addr table. This consumes additional space, these > entries are generally not reusable, and it doesn't reduce the > overall number of relocations that the linker must process. If you're encountering size penalties with non-split DWARFv5 due to debug_addr indirection - we could change LLVM to choose which addresses to indirect and which ones to use the classing/DWARFv4-esque representations. (But, yeah, overall, I think it's better for lots of use cases to support an addr+offset encoding) > It's not feasible to define a new attribute for address+offset, > because an attribute has only one value, and the attribute would > have to specify both the .debug_addr index and the offset to add. I don't follow this ^ - I think previously we've discussed at least 2 representations that could do this: uleb+uleb generalized exprloc support admittedly uleb+uleb has the problem that it's a variable-length encoding, but at least LLVM currently is using addrx exclusively, and not the addrxN fixed length encodings. > But, we could define an "indirect" entry in .debug_addr, and then > reference it with an attribute in the same way that we reference > any other .debug_addr entry. This direction would, for my use case, be unfortunate - since my goal is to remove as much DWARF from object files as possible under Split DWARF - so leaving anything extra in debug_addr works against that goal. > An indirect entry would be the same size as all other entries in > .debug_addr (i.e., the size of an address on the target). The > upper half would be another index into .debug_addr and the lower > half would be the addend. The consumer adds the addend to the > value from the entry specified by the "another index." If it's OK to use such a small fixed length encoding (addrx supports variable length with fixed lengths of 1/2/3/4 - offsets in LLVM are emitted as data4) then we could introduce that as the FORM_addrx4_offset4 (or could make it variable length depending on pointer size - but that seems less relevant when it's not uin the debug_addr section) form and a uleb+uleb form, without providing all the possible combinations of addrx{1,2,3,4,N}_offset{1,2,3,4,M}. In any case, I think of these forms as sort of special case/compact/easier to parse encodings of the generalized exprloc (DW_OP_addrx(N), DW_OP_constu(M), DW_OP_plus). > > This solution doesn't save space in .debug_addr, but it does > reduce the number of relocations. Ideally .debug_addr would > require only one relocation per function. > > We can debate whether the addend should be signed or unsigned, > and whether the indirect entries should be a separate subtable, > but I wanted to float the idea here before I wrote it up as a > proposal. I'd be fairly in favor of unsigned. Generally LLVM already picks the first address used by DWARF in any ELF section as the address to put in the pool - trying to do everything relative to that that it can. (so, eg: if you have DW_AT_ranges on a DW_TAG_lexical_scope in your function, the rnglist for that will set a base address of the start of the function (say, assuming function sections) and use offset pairs relative to that) > Alternatively, the indirect sub-table could be encoded with > ULEB/SLEB pairs, but that makes it hard to find them by index. > They could be found by a direct reference, but that requires a > relocation from .debug_info to .debug_addr, so we haven't saved > any relocations that way. > > If there are obvious flaws I can't see, or someone is inspired > to come up with another solution, please let me know! Otherwise > I'll write it up as a formal proposal probably later this week. > > Thanks, > --paulr > > _______________________________________________ > Dwarf-Discuss mailing list > Dwarf-Discuss@lists.dwarfstd.org > http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org _______________________________________________ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org