I think it's pretty safe to say:

-        A reference into a TU from a CU or a different TU is invariably by 
ref_sig8, never by section offset.

-        A reference into a CU from another CU has to be by ref_addr; in a .o 
file this can use a relocation, in a .dwo file it has to be from inside the 
same .debug_info contribution.

-        A reference into a CU from a TU is not allowed, even if the TU lives 
in the same .debug_info contribution.
I don't have my hands on words in the document that say these things, but I am 
quite sure that's the intent.  It's not important whether object-file mechanics 
would allow you to do the things that aren't allowed above.

The rest of what I'm suggesting all follows (reiterating for clarity):

-        If a .dwo file has multiple split-full CUs, they each have a unique 
DWO ID (so the index can describe them individually).

-        Therefore, the corresponding .o has a distinct corresponding skeleton 
CU for each split-full CU.

-        Cross-CU references within a .dwo file are by DW_FORM_ref_addr and the 
related CUs must be in the same .debug_info contribution.

-        Split-full CUs without cross-CU references can be in separate 
.debug_info contributions within the .dwo file.

-        A packager should look for multiple CUs in a .debug_info contribution, 
be willing to create an index entry for each one, and not split up the 
contribution even if one or more of the CUs has already been included from 
elsewhere.

-        A packager can drop an entire .debug_info contribution if *all* of the 
CUs in that contribution have been included from elsewhere.  (This trivially 
covers the one-CU-per-contribution case.)

-        The package index should get a new column to describe the entire 
.debug_info contribution containing the CU, so that consumers can know how to 
resolve DW_FORM_ref_addr.

You're probably still thinking of wrinkles I haven't addressed; let me know.
--paulr

From: David Blaikie [mailto:dblai...@gmail.com]
Sent: Thursday, May 04, 2017 5:30 PM
To: Robinson, Paul; dwarf-discuss@lists.dwarfstd.org; Eric Christopher
Subject: Re: [Dwarf-Discuss] Fission + cross-CU references (ref_addr)


On Thu, May 4, 2017 at 5:05 PM Robinson, Paul 
<paul.robin...@sony.com<mailto:paul.robin...@sony.com>> wrote:
Skeleton units are pretty small; it's a 20-byte header, plus values for the 
compile_unit DIE, which is spec'd to have no children.  I would not be 
concerned about space there.  And having unique DWO IDs per unit seems pretty 
useful.
I tend to agree, though - what sort of uses do you have in mind?

A unique DWO ID per unit lets each DWO unit have a distinct entry in the index… 
saves the consumer the trouble of having to read the .debug_info section to 
find the units.

Yep

  If you want to require consumers to do more work, you can make DWO IDs be 
per-file instead of per-unit, and then there's no need for an INFO_FILE column 
because the INFO column would necessarily have to cover the entire .debug_info 
section from that file.

Yep - time/space tradeoff, and I'd probably err on the side of time myself (by 
having separate skeletons and cu_index entries for each CU) as you've 
suggested. Just floating the other as an alternative since it did come up.


In non-split DWARF, type units are spec'd to have their own object-file section 
contributions, separate from the compile unit(s);
That's sort of an implementation detail though, isn't it? DWARF just talks 
about bytes in sections (type units go in the debug_types section (or, now, the 
debug_info section)) and, yeah, you can use comdat groups and separate chunks 
of debug_types sections to deduplicate them, but I don't think DWARF 
requires/speaks about that, does it?
Actually DWARF 5 Appendix E does describe this; not as a required tactic, but a 
way to achieve the useful effect of deduplicating type units etc.  So yes it 
was overstating the case to say they are _spec'd_ to have their own 
contributions, but that would be what a producer would normally do.

Ah, cool - thanks for the pointer about where the wording is.


that's what lets type units have a COMDAT key and be uniqued by the linker, 
even though all those separate contributions have the same section name.  (In 
ELF, you have multiple section headers with the same section name.)  Surely the 
DWO file could be (is?) done the same way, with each type unit in its own 
contribution to the .debug_info section?
Funny story about that...
Heh.  Which way works better (types all together, or each in their own 
contribution) depends on whether your packager wants to deduplicate in a 
linker-like way, based on COMDATs,

I think neither GCC nor Clang used COMDATs in DWO files - but GCC still put 
them in separate sections (sections with the same name... ) - a weird beast to 
me, but apparently it's a thing that works.

But yeah, for non-Fission, COMDATs seem solid, though do represent a limitation 
on compressibility, etc.

or in a purpose-built way, by looking at the type-unit signatures directly.  
DWARF doesn't say you have to do one or the other, which provides 
implementation flexibility to the toolchain.  If your packager is willing to 
look through TUs for signatures and deduplicate that way, then you can stuff 
all the TUs into one section contribution and get better compression.  Quality 
of Implementation, as we like to say.

To be sure!


still there would need to be a special case where the TU's debug_info chunk 
would have an INFO_FILE contribution that represente the CU chunk. So a DWP 
tool would have to special case the info chunk that contained the CUs (& would 
have to require that there be only one if there are to be TU->CU references. I 
suppose if TU->CU references aren't supported

I don't think TU->CU references are permitted.

For now, with Fission, I agree.

I think without Fission you could certainly use ref_addr to refer to something 
in a CU from a TU - /maybe/ even from a TU to another TU but I don't think so 
(not sure if the linker would do the right thing about reachability, etc - and 
if your TUs differed in layout, which they can even in Clang, that wouldn't 
work out well if it picked a different TU and either null'd out the ref_addr, 
or made it refer to the same offset in a different copy of the type (I don't 
think any linker/reloc construct would really result in this latter situation))

Certainly you could not have a v4 split TU referencing a CU, that would be 
impossible.

v4 didn't have split things (Fission being a v5 feature, I think), did it? 
What's the distinction you're drawing there?

  (Without relocations, you can't use DW_FORM_ref_addr to point from 
.debug_types to .debug_info; and DW_FORM_ref_sig8 is only for references to 
other type units.)  While you could engineer the possibility in v5, because 
type units have moved back into .debug_info and in principle you could arrange 
for DW_FORM_ref_addr to do that, I am morally certain there was no intent to 
allow that.

Right, I doubt there was any intent - but as we're choosing some new 
representations, etc, I'm wondering if it's something to think about.

Even without the TU->XU reference question, the TU/CU unification still means 
that a DWP creation tool would have to special case the CUs in some, or require 
the TUs to be placed in separate sections as GCC does it. (then it could treat 
each unit section as an indivisible blob)

Maybe this fits into quality of implementation - but I Think the presence of 
cross-unit references makes this a bit more of a matter for the standard as to 
how these groups are defined, where cross-CU references are resolved relative 
to, how can type units be dropped (or not), etc.

I'm sort of leaning towards "ref_addr offsets are resolved relative to the 
widest range of CUs in a single section that contains the referring DIE" - 
though that is a bit of a mouthful/awkward thing to implement.

- Dave

--paulr
_______________________________________________
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org

Reply via email to