[Dwarf-Discuss] .debug_addr entry plus offset
David Blaikie has brought this up with me (or in conversations that I observed) a couple of times: It's common to want to refer to a particular address plus an offset, for example for DW_AT_low_pc or DW_AT_ranges to describe a lexical block or inlined subprogram within another subprogram. Generally the only symbolic address available is the entry point of the containing subprogram. Back when addresses were held directly in the .debug_info section, the attributes would have relocations, the offset would be encoded into the relocation and the linker would just do the right thing. With DWARF v5, we now have the .debug_addr section, which contains the addresses to be fixed up by the linker. But, we don't have a way to specify an offset to add to an entry in the .debug_addr section; instead, each unique addr+offset requires its own entry in the .debug_addr table. This consumes additional space, these entries are generally not reusable, and it doesn't reduce the overall number of relocations that the linker must process. It's not feasible to define a new attribute for address+offset, because an attribute has only one value, and the attribute would have to specify both the .debug_addr index and the offset to add. But, we could define an "indirect" entry in .debug_addr, and then reference it with an attribute in the same way that we reference any other .debug_addr entry. An indirect entry would be the same size as all other entries in .debug_addr (i.e., the size of an address on the target). The upper half would be another index into .debug_addr and the lower half would be the addend. The consumer adds the addend to the value from the entry specified by the "another index." This solution doesn't save space in .debug_addr, but it does reduce the number of relocations. Ideally .debug_addr would require only one relocation per function. We can debate whether the addend should be signed or unsigned, and whether the indirect entries should be a separate subtable, but I wanted to float the idea here before I wrote it up as a proposal. Alternatively, the indirect sub-table could be encoded with ULEB/SLEB pairs, but that makes it hard to find them by index. They could be found by a direct reference, but that requires a relocation from .debug_info to .debug_addr, so we haven't saved any relocations that way. If there are obvious flaws I can't see, or someone is inspired to come up with another solution, please let me know! Otherwise I'll write it up as a formal proposal probably later this week. Thanks, --paulr ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
Re: [Dwarf-Discuss] .debug_addr entry plus offset
On Tue, Sep 15, 2020 at 10:13 AM Robinson, Paul via Dwarf-Discuss wrote: > > David Blaikie has brought this up with me (or in conversations that > I observed) a couple of times: Thanks for bringing this up! Not sure if I've raised this on dwarf-discuss specifically before.. ah, yeah, 3 years ago: http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2017-June/004378.html http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2017-July/thread.html#4380 http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2017-August/004393.html Most recently I had an idea for a workaround that I proposed on the llvm-dev mailing list: https://groups.google.com/g/llvm-dev/c/g3eGxhi4ATU/m/fbrBPFxNBwAJ The idea being that actually using debug_rnglists even for contiguous ranges would reduce .o/executable file size when using Split DWARF. I think the data I had even showed breakeven for non-split DWARF object files, probably slight growth for linked executables in that case, though. > It's common to want to refer to a particular address plus an offset, > for example for DW_AT_low_pc or DW_AT_ranges to describe a lexical > block or inlined subprogram within another subprogram. Yep - the ones I'm especially interested in now, are those that won't be addressed even by a "ranges everywhere" approach (though that approach does have size tradeoffs that I'd like to avoid/improve on too, for sure!) - DW_TAG_call_site's DW_AT_call_pc/DW_AT_call_return_pc and DW_TAG_label's DW_AT_low_pc. The latter isn't super common in code I'm dealing with, but the former is pretty ubiquitous now. > Generally > the only symbolic address available is the entry point of the > containing subprogram. Back when addresses were held directly in > the .debug_info section, the attributes would have relocations, the > offset would be encoded into the relocation and the linker would > just do the right thing. > > With DWARF v5, we now have the .debug_addr section, which contains > the addresses to be fixed up by the linker. But, we don't have a > way to specify an offset to add to an entry in the .debug_addr > section; instead, each unique addr+offset requires its own entry > in the .debug_addr table. This consumes additional space, these > entries are generally not reusable, and it doesn't reduce the > overall number of relocations that the linker must process. If you're encountering size penalties with non-split DWARFv5 due to debug_addr indirection - we could change LLVM to choose which addresses to indirect and which ones to use the classing/DWARFv4-esque representations. (But, yeah, overall, I think it's better for lots of use cases to support an addr+offset encoding) > It's not feasible to define a new attribute for address+offset, > because an attribute has only one value, and the attribute would > have to specify both the .debug_addr index and the offset to add. I don't follow this ^ - I think previously we've discussed at least 2 representations that could do this: uleb+uleb generalized exprloc support admittedly uleb+uleb has the problem that it's a variable-length encoding, but at least LLVM currently is using addrx exclusively, and not the addrxN fixed length encodings. > But, we could define an "indirect" entry in .debug_addr, and then > reference it with an attribute in the same way that we reference > any other .debug_addr entry. This direction would, for my use case, be unfortunate - since my goal is to remove as much DWARF from object files as possible under Split DWARF - so leaving anything extra in debug_addr works against that goal. > An indirect entry would be the same size as all other entries in > .debug_addr (i.e., the size of an address on the target). The > upper half would be another index into .debug_addr and the lower > half would be the addend. The consumer adds the addend to the > value from the entry specified by the "another index." If it's OK to use such a small fixed length encoding (addrx supports variable length with fixed lengths of 1/2/3/4 - offsets in LLVM are emitted as data4) then we could introduce that as the FORM_addrx4_offset4 (or could make it variable length depending on pointer size - but that seems less relevant when it's not uin the debug_addr section) form and a uleb+uleb form, without providing all the possible combinations of addrx{1,2,3,4,N}_offset{1,2,3,4,M}. In any case, I think of these forms as sort of special case/compact/easier to parse encodings of the generalized exprloc (DW_OP_addrx(N), DW_OP_constu(M), DW_OP_plus). > > This solution doesn't save space in .debug_addr, but it does > reduce the number of relocations. Ideally .debug_addr would > require only one relocation per function. > > We can debate whether the addend should be signed or unsigned, > and whether the indirect entries should be a separate subtable, > but I wanted to float the idea here before I wrote it up as a > proposal. I'd be fairly in favor of unsigned. Generally LLVM already pick
Re: [Dwarf-Discuss] .debug_addr entry plus offset
One simple approach would be to be able to represent a DW_AT_low_pc with a DW_FORM_data encoding just like the DW_AT_high_pc does when it is an offset from the DW_AT_low_pc. The value of the DW_AT_low_pc would be an offset from either: 1 - the parent DIE's DW_AT_low_pc (which itself might need to be resolved by looking at the parent scope). If the parent DIE's range is a DW_AT_ranges, then use the lowest address out of all of them. 2 - the first parent DIE with a DW_AT_low_pc that has a DW_FORM_addrXXX encoding. Solution #1 is nice because it keeps the offset in the DW_FORM_data encoding small since it is always relative to the first parent scope's DW_AT_low_pc. So this could save a lot of space in the DWARF if we use the smallest possible DW_FORM_data encoding all the time. Solution #2 could be easier as you would traverse parent scopes looking for an address encoding as the DW_FORM. This would allow DW_TAG_subprogram DIEs to have a single relocation on the DW_AT_low_pc. Greg Clayton > On Sep 15, 2020, at 10:12 AM, Robinson, Paul via Dwarf-Discuss > wrote: > > David Blaikie has brought this up with me (or in conversations that > I observed) a couple of times: > > It's common to want to refer to a particular address plus an offset, > for example for DW_AT_low_pc or DW_AT_ranges to describe a lexical > block or inlined subprogram within another subprogram. Generally > the only symbolic address available is the entry point of the > containing subprogram. Back when addresses were held directly in > the .debug_info section, the attributes would have relocations, the > offset would be encoded into the relocation and the linker would > just do the right thing. > > With DWARF v5, we now have the .debug_addr section, which contains > the addresses to be fixed up by the linker. But, we don't have a > way to specify an offset to add to an entry in the .debug_addr > section; instead, each unique addr+offset requires its own entry > in the .debug_addr table. This consumes additional space, these > entries are generally not reusable, and it doesn't reduce the > overall number of relocations that the linker must process. > > It's not feasible to define a new attribute for address+offset, > because an attribute has only one value, and the attribute would > have to specify both the .debug_addr index and the offset to add. > But, we could define an "indirect" entry in .debug_addr, and then > reference it with an attribute in the same way that we reference > any other .debug_addr entry. > > An indirect entry would be the same size as all other entries in > .debug_addr (i.e., the size of an address on the target). The > upper half would be another index into .debug_addr and the lower > half would be the addend. The consumer adds the addend to the > value from the entry specified by the "another index." > > This solution doesn't save space in .debug_addr, but it does > reduce the number of relocations. Ideally .debug_addr would > require only one relocation per function. > > We can debate whether the addend should be signed or unsigned, > and whether the indirect entries should be a separate subtable, > but I wanted to float the idea here before I wrote it up as a > proposal. > > Alternatively, the indirect sub-table could be encoded with > ULEB/SLEB pairs, but that makes it hard to find them by index. > They could be found by a direct reference, but that requires a > relocation from .debug_info to .debug_addr, so we haven't saved > any relocations that way. > > If there are obvious flaws I can't see, or someone is inspired > to come up with another solution, please let me know! Otherwise > I'll write it up as a formal proposal probably later this week. > > Thanks, > --paulr > > ___ > Dwarf-Discuss mailing list > Dwarf-Discuss@lists.dwarfstd.org > http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
Re: [Dwarf-Discuss] .debug_addr entry plus offset
One simple approach would be to be able to represent a DW_AT_low_pc with a DW_FORM_data encoding just like the DW_AT_high_pc does when it is an offset from the DW_AT_low_pc. The value of the DW_AT_low_pc would be an offset from either: 1 - the parent DIE's DW_AT_low_pc (which itself might need to be resolved by looking at the parent scope). If the parent DIE's range is a DW_AT_ranges, then use the lowest address out of all of them. 2 - the first parent DIE with a DW_AT_low_pc that has a DW_FORM_addrXXX encoding. Solution #1 is nice because it keeps the offset in the DW_FORM_data encoding small since it is always relative to the first parent scope's DW_AT_low_pc. So this could save a lot of space in the DWARF if we use the smallest possible DW_FORM_data encoding all the time. Solution #2 could be easier as you would traverse parent scopes looking for an address encoding as the DW_FORM. This would allow DW_TAG_subprogram DIEs to have a single relocation on the DW_AT_low_pc. Greg Clayton > On Sep 15, 2020, at 10:12 AM, Robinson, Paul via Dwarf-Discuss > wrote: > > David Blaikie has brought this up with me (or in conversations that > I observed) a couple of times: > > It's common to want to refer to a particular address plus an offset, > for example for DW_AT_low_pc or DW_AT_ranges to describe a lexical > block or inlined subprogram within another subprogram. Generally > the only symbolic address available is the entry point of the > containing subprogram. Back when addresses were held directly in > the .debug_info section, the attributes would have relocations, the > offset would be encoded into the relocation and the linker would > just do the right thing. > > With DWARF v5, we now have the .debug_addr section, which contains > the addresses to be fixed up by the linker. But, we don't have a > way to specify an offset to add to an entry in the .debug_addr > section; instead, each unique addr+offset requires its own entry > in the .debug_addr table. This consumes additional space, these > entries are generally not reusable, and it doesn't reduce the > overall number of relocations that the linker must process. > > It's not feasible to define a new attribute for address+offset, > because an attribute has only one value, and the attribute would > have to specify both the .debug_addr index and the offset to add. > But, we could define an "indirect" entry in .debug_addr, and then > reference it with an attribute in the same way that we reference > any other .debug_addr entry. > > An indirect entry would be the same size as all other entries in > .debug_addr (i.e., the size of an address on the target). The > upper half would be another index into .debug_addr and the lower > half would be the addend. The consumer adds the addend to the > value from the entry specified by the "another index." > > This solution doesn't save space in .debug_addr, but it does > reduce the number of relocations. Ideally .debug_addr would > require only one relocation per function. > > We can debate whether the addend should be signed or unsigned, > and whether the indirect entries should be a separate subtable, > but I wanted to float the idea here before I wrote it up as a > proposal. > > Alternatively, the indirect sub-table could be encoded with > ULEB/SLEB pairs, but that makes it hard to find them by index. > They could be found by a direct reference, but that requires a > relocation from .debug_info to .debug_addr, so we haven't saved > any relocations that way. > > If there are obvious flaws I can't see, or someone is inspired > to come up with another solution, please let me know! Otherwise > I'll write it up as a formal proposal probably later this week. > > Thanks, > --paulr > > ___ > Dwarf-Discuss mailing list > Dwarf-Discuss@lists.dwarfstd.org > http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
Re: [Dwarf-Discuss] .debug_addr entry plus offset
On Tue, Sep 15, 2020 at 2:47 PM Greg Clayton via Dwarf-Discuss wrote: > > One simple approach would be to be able to represent a DW_AT_low_pc with a > DW_FORM_data encoding just like the DW_AT_high_pc does when it is an offset > from the DW_AT_low_pc. I'm not sure this would catch all the desired cases/be especially tidy to implement, unfortunately. > The value of the DW_AT_low_pc would be an offset from either: > 1 - the parent DIE's DW_AT_low_pc (which itself might need to be resolved by > looking at the parent scope). If the parent DIE's range is a DW_AT_ranges, > then use the lowest address out of all of them. "lowest address in DW_AT_ranges" wouldn't be suitable when ranges are used across sections (eg: some CU ranges - when functions are in different sections due to inline functions or -ffunction-sections). If everything was in one section then an implementation could use low_pc to indicate a good base address even if they still needed DW_AT_ranges (eg: void f1() { } __attribute__((nodebug)) void f2() { } void f3() { } - or other cases where a single section with multiple hunks of debug info could exist with holes in between) - but it's possible to have that and ranges. eg: // compiled without function sections, so f1 is in one section, but f2 and f3 are in a single section together, separate from f1 inline void f1() { } void f2() { f1(); } void f3() { } the low_pc of f3 could benefit from using the same address (+offset) as the low_pc of f2 - but there would be no clear way to indicate which part of the CU's DW_AT_ranges could be used as the base address for 'f3'. > 2 - the first parent DIE with a DW_AT_low_pc that has a DW_FORM_addrXXX > encoding. Similar in the example above, 'f3' has no parent with a suitable low_pc, but would benefit from sharing the same debug_addr entry as 'f2'. A more extreme example happens in LLVM's prototype "Propeller" feature - which essentially is "basic block sections" - where even a single function may be fragmented across multiple sections and have no specific ordering/scope based hierarchy about which base address to use (so the function would have DW_AT_ranges, not just DW_AT_low/high - and some internal scope could have a contiguous range and would want to reuse one of the addresses used in DW_AT_ranges (+an offset from it)). - Dave > Solution #1 is nice because it keeps the offset in the DW_FORM_data encoding > small since it is always relative to the first parent scope's DW_AT_low_pc. > So this could save a lot of space in the DWARF if we use the smallest > possible DW_FORM_data encoding all the time. > Solution #2 could be easier as you would traverse parent scopes looking for > an address encoding as the DW_FORM. > > This would allow DW_TAG_subprogram DIEs to have a single relocation on the > DW_AT_low_pc. > > Greg Clayton > > > > On Sep 15, 2020, at 10:12 AM, Robinson, Paul via Dwarf-Discuss > > wrote: > > > > David Blaikie has brought this up with me (or in conversations that > > I observed) a couple of times: > > > > It's common to want to refer to a particular address plus an offset, > > for example for DW_AT_low_pc or DW_AT_ranges to describe a lexical > > block or inlined subprogram within another subprogram. Generally > > the only symbolic address available is the entry point of the > > containing subprogram. Back when addresses were held directly in > > the .debug_info section, the attributes would have relocations, the > > offset would be encoded into the relocation and the linker would > > just do the right thing. > > > > With DWARF v5, we now have the .debug_addr section, which contains > > the addresses to be fixed up by the linker. But, we don't have a > > way to specify an offset to add to an entry in the .debug_addr > > section; instead, each unique addr+offset requires its own entry > > in the .debug_addr table. This consumes additional space, these > > entries are generally not reusable, and it doesn't reduce the > > overall number of relocations that the linker must process. > > > > It's not feasible to define a new attribute for address+offset, > > because an attribute has only one value, and the attribute would > > have to specify both the .debug_addr index and the offset to add. > > But, we could define an "indirect" entry in .debug_addr, and then > > reference it with an attribute in the same way that we reference > > any other .debug_addr entry. > > > > An indirect entry would be the same size as all other entries in > > .debug_addr (i.e., the size of an address on the target). The > > upper half would be another index into .debug_addr and the lower > > half would be the addend. The consumer adds the addend to the > > value from the entry specified by the "another index." > > > > This solution doesn't save space in .debug_addr, but it does > > reduce the number of relocations. Ideally .debug_addr would > > require only one relocation per function. > > > > We can debate whether the addend should be signed or