Paul, I haven't needed to contend with this issue. But as I was looking over the standard, this was my initial gut reaction too: use the segment selectors. This use actually does seem like it's a characteristic of the target architecture to me. You started the discussion with "Harvard architectures".
DWARF does permit architectures to specify aspects of their DWARF description, after all. I can't recall it ever being done *formally*, but it's been done informally for every architecture that uses DWARF. At a bare minimum, register encodings. And usually you have to root around in somebody else's source code to find it. This one has a slightly higher chance of breaking a consumer, if that consumer was written not to tolerate the segment selectors. But I think it would be fair to put any such blame on the consumer in that case. If the consumer doesn't die with a SIGSEGV, then it might ignore the segments. And then it would be no worse off than now. On Thu, Mar 19, 2020 at 06:05:16PM +0000, Dwarf Discussion wrote: > This recently came up in the LLVM project. Harvard architectures > put code and data into separate address spaces, but those spaces > are not explicit; instructions that load/store memory implicitly > use the data space, while things like taking a function address or > doing indirect branches will implicitly use the code space. This > doubles the effective size of memory without consuming an address > bit, as well as having other secondary benefits like not allowing > self-modifying code. > > Nearly all of the DWARF information does not need to distinguish > between code and address spaces, because it's easy to derive that > from context. Addresses in the line table or a range list will be > code addresses; in .debug_info, addresses of code elements will be > code addresses, while variables will be data addresses. And so on. > > This only seems to break down in the .debug_aranges section, which > records both data and code addresses without any context to let a > consumer know which is what. In a flat-address architecture, no > distinction is needed; in a segmented architecture, there will be > a segment selector as part of any address, and that includes the > .debug_aranges section. What about for Harvard architectures? > > What I suggested in the LLVM project is that .debug_aranges would > have a 1-byte segment selector and use some trivial scheme such as > 0=code, 1=data to distinguish what kind of address it is. Other > DWARF sections wouldn't need a selector because they can all use > context to figure it out; this avoids the size overhead of using > segment selectors everywhere else. > > Pavel Labath pointed out that this seems inconsistent and might > make consumers unhappy; segment selectors are described as a > characteristic of the target architecture, so having them in one > place and not others might look suspicious. IMO it's a reasonable > "permissive" use of the existing DWARF structures, but it seemed > worth asking here. > > Does this (segment selector only in .debug_aranges) sound okay? > Should there be non-normative text or a wiki description of this? > Do we want to codify the 0=code 1=data use of segment selectors > for all Harvard architectures (that don't otherwise have explicit > segements) so that this doesn't have to be set by ABI committees? > > I'm willing to write up whatever needs writing up, either as a > proposal or as a wiki entry. > > Thanks, > --paulr > > _______________________________________________ > Dwarf-Discuss mailing list > Dwarf-Discuss@lists.dwarfstd.org > http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org -- Todd Allen Concurrent Real-Time _______________________________________________ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org