On Mon, May 4, 2015 at 12:27 PM, Adrian Prantl <[email protected]> wrote:
> > > >> On May 4, 2015, at 11:38 AM, David Blaikie <[email protected]> wrote: > >> > >> > >> > >> On Mon, May 4, 2015 at 11:24 AM, Adrian Prantl <[email protected]> > wrote: > >> > >>> On May 4, 2015, at 10:53 AM, David Blaikie <[email protected]> wrote: > >>> > >>> > >>> > >>> On Fri, May 1, 2015 at 8:52 PM, Adrian Prantl <[email protected]> > wrote: > >>>> > >>>>> On May 1, 2015, at 5:25 PM, David Blaikie <[email protected]> > wrote: > >>>>> > >>>>> > >>>>> > >>>>> On Fri, May 1, 2015 at 5:19 PM, Adrian Prantl <[email protected]> > wrote: > >>>>> > >>>>>> On May 1, 2015, at 4:55 PM, David Blaikie <[email protected]> > wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Fri, May 1, 2015 at 4:39 PM, Adrian Prantl <[email protected]> > wrote: > >>>>>> > >>>>>> > On May 1, 2015, at 10:01 AM, David Blaikie <[email protected]> > wrote: > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > On Fri, May 1, 2015 at 9:52 AM, Adrian Prantl <[email protected]> > wrote: > >>>>>> >> > >>>>>> >>> On May 1, 2015, at 9:23 AM, David Blaikie <[email protected]> > wrote: > >>>>>> >>> > >>>>>> >>> > >>>>>> >>> > >>>>>> >>> On Thu, Apr 30, 2015 at 5:21 PM, Adrian Prantl < > [email protected]> wrote: > >>>>>> >>> > >>>>>> >>> > On Apr 30, 2015, at 4:55 PM, David Blaikie < > [email protected]> wrote: > >>>>>> >>> > > >>>>>> >>> > > >>>>>> >>> > > >>>>>> >>> > On Thu, Apr 30, 2015 at 4:31 PM, Adrian Prantl < > [email protected]> wrote: > >>>>>> >>> >> > >>>>>> >>> >> > On Mar 19, 2015, at 5:37 PM, David Blaikie < > [email protected]> wrote: > >>>>>> >>> >> > > >>>>>> >>> >> > > >>>>>> >>> >> > > >>>>>> >>> >> > On Thu, Mar 19, 2015 at 5:24 PM, Adrian Prantl < > [email protected]> wrote: > >>>>>> >>> >> >> > >>>>>> >>> >> >> > On Mar 16, 2015, at 2:55 PM, David Blaikie < > [email protected]> wrote: > >>>>>> >>> >> >> > > >>>>>> >>> >> >> > > >>>>>> >>> >> >> > > >>>>>> >>> >> >> >> On Mon, Mar 16, 2015 at 2:45 PM, Robinson, Paul < > [email protected]> wrote: > >>>>>> >>> >> >> > Beyond the above (that using a new tag would mean this > would go from 'free' to 'not free' for GDB) having a new top level tag is > pretty substantial (we only have two at the moment, and with our talk of > modules being a "bag of dwarf" might go back to having one top level tag? > (it's not clear to me from DWARF4 whether DW_TAG_module is currently a > top-level tag, I don't think it is?) > >>>>>> >>> >> >> > > >>>>>> >>> >> >> >> The .debug_info section contains one or more > compilation units, partial units, or in DWARF 5, type units. DW_TAG_module > isn't a unit, if you want it to be handled independently then it would need > to be wrapped in a DW_TAG_partial_unit. You would probably then use > DW_TAG_imported_unit to refer to it, rather than DW_TAG_imported_module. > >>>>>> >>> >> >> >> > >>>>>> >>> >> >> > > >>>>>> >>> >> >> > This makes a fair bit of sense - though the > terminology's never going to quite line up with modules, I suspect, and > this would still require modifying existing consumers (well, GDB) that can > handle split-dwarf today, I suspect (not sure how it'd handle partial_unit > - maybe that does work? - and still don't know how existing consumers would > handle imported_unit either - could be worth some testing, as it sounds > sort of right out of several less right options). > >>>>>> >>> >> >> > >>>>>> >>> >> >> Thanks for all the input so far! > >>>>>> >>> >> >> To concretize this end of the discussion up let’s sketch > some dwarf of how this could look like in practice. > >>>>>> >>> >> >> > >>>>>> >>> >> >> ELF (no imports) > >>>>>> >>> >> >> ---------------- > >>>>>> >>> >> >> > >>>>>> >>> >> >> On ELF or COFF a foo.c referencing types from the module > Foundation looks like this: > >>>>>> >>> >> >> > >>>>>> >>> >> >> .debug_info: > >>>>>> >>> >> >> DW_TAG_compile_unit > >>>>>> >>> >> >> DW_AT_name(“foo.c”) > >>>>>> >>> >> >> > >>>>>> >>> >> >> .debug_info.dwo (on ELF: group 0x1234ABCDE, comdat) > >>>>>> >>> >> >> DW_TAG_partial_unit > >>>>>> >>> >> > > >>>>>> >>> >> > For now I'd suggest we use compile_unit - that way it'll > just work with existing split-dwarf consumers. We can see about > standardizing a top-level DW_TAG_module or using DW_TAG_partial_unit here > later, perhaps? I'm not sure. > >>>>>> >>> >> > > >>>>>> >>> >> >> > DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”) > >>>>>> >>> >> >> DW_AT_dwo_id(“0x1234ABCDE”) > >>>>>> >>> >> >> > >>>>>> >>> >> >> > >>>>>> >>> >> >> Side question: Is .debug_info.dwo the right section to > put the module skeleton in, or should it be a .debug_info section like > normal fission skeletons? > >>>>>> >>> >> > > >>>>>> >>> >> > Skeletons go in .debug_info, the dwo sections are just for > the .dwo file (or the module file, in our new case - the extension isn't > actually important). > >>>>>> >>> >> > > >>>>>> >>> >> > It might be worth you compiling an example or two of > split-dwarf to see how this all works hands-on. > >>>>>> >>> >> > > >>>>>> >>> >> >> Mach-O (no comdat, no imports) > >>>>>> >>> >> >> ------------------------------ > >>>>>> >>> >> >> > >>>>>> >>> >> >> Mach-O doesn’t do comdat, so with -split-dwarf=Disable > (not sure if that option is the best discriminator) this could look like: > >>>>>> >>> >> >> > >>>>>> >>> >> >> .debug_info: > >>>>>> >>> >> >> DW_TAG_compile_unit > >>>>>> >>> >> >> DW_AT_name(“foo.c”) > >>>>>> >>> >> >> DW_TAG_partial_unit > >>>>>> >>> >> >> > DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”) > >>>>>> >>> >> >> DW_AT_dwo_id(“0x1234ABCDE”) > >>>>>> >>> >> >> > >>>>>> >>> >> >> > >>>>>> >>> >> >> Mach-O (no comdat, with imports) > >>>>>> >>> >> >> ------------------------------ > >>>>>> >>> >> >> > >>>>>> >>> >> >> If we add the module import information to this, we get: > >>>>>> >>> >> >> > >>>>>> >>> >> >> .debug_info: > >>>>>> >>> >> >> DW_TAG_compile_unit > >>>>>> >>> >> >> DW_AT_name(“foo.c”) > >>>>>> >>> >> >> DW_TAG_imported_module > >>>>>> >>> >> >> DW_AT_import(DW_FORM_ref_addr 0x10) > >>>>>> >>> >> > > >>>>>> >>> >> > Since we got went down the tangent of explaining > split-dwarf many emails ago, I've forgotten (& can't readily find) what we > were discussing about what ways the imported_module could work. > >>>>>> >>> >> > > >>>>>> >>> >> > The simplest representation I can think of would be to > have it reference, by signature, the module unit (whatever tag it uses) - > DW_FORM_ref_sig8, seems the simplest thing to do. > >>>>>> >>> >> > > >>>>>> >>> >> >> > >>>>>> >>> >> >> DW_TAG_partial_unit > >>>>>> >>> >> >> > DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”) > >>>>>> >>> >> >> DW_AT_dwo_id(“0x1234ABCDE”) > >>>>>> >>> >> >> > >>>>>> >>> >> >> 0x10: > >>>>>> >>> >> > > >>>>>> >>> >> > This is inside the partial unit? I figured we'd just put > these attributes on the top level (compile_unit, or whatever it might be > later) - potentially conditionalized on platform, sure. > >>>>>> >>> >> > > >>>>>> >>> >> >> DW_TAG_module > >>>>>> >>> >> >> DW_AT_name(“Foundation”) > >>>>>> >>> >> >> DW_AT_LLVM_sysroot(“/“) > >>>>>> >>> >> >> DW_AT_LLVM_include_dir(“”) > >>>>>> >>> >> >> DW_AT_LLVM_macros(“-DNDEBUG”) > >>>>>> >>> >> >> ... > >>>>>> >>> >> >> > >>>>>> >>> >> >> > >>>>>> >>> >> >> ELF (comdat, with imports) > >>>>>> >>> >> >> -------------------------- > >>>>>> >>> >> >> > >>>>>> >>> >> >> But now let’s go back to ELF. Since the skeleton with the > partial unit is comdat'd, I assume that this breaks the FORM_ref_addr used > in the DW_AT_import. We could reuse the module hash as a signature for the > module: > >>>>>> >>> >> >> > >>>>>> >>> >> >> .debug_info: > >>>>>> >>> >> >> DW_TAG_compile_unit > >>>>>> >>> >> >> DW_AT_name(“foo.c”) > >>>>>> >>> >> >> DW_TAG_imported_module > >>>>>> >>> >> >> DW_AT_import(DW_FORM_ref_addr 0x1234ABCDE) > >>>>>> >>> >> > > >>>>>> >>> >> > Still only really need these imported_modules for lldb, > right? I'd consider having them off-by-default for non-darwin, but I'm not > strictly wedded to that notion. Wouldn't mind seeing size impact numbers of > some kind - if it's really fractional % increase & GDB doesn't fall over > when it sees them (in whatever FORM/tag/etc we decide on) then that's not > the end of the world. > >>>>>> >>> >> > > >>>>>> >>> >> > Just seems nice if the default mode is the nice, standard, > split-dwarf output. Doesn't need anything fancy. > >>>>>> >>> >> > > >>>>>> >>> >> > > >>>>>> >>> >> >> .debug_info.dwo (group 0x1234ABCDE, comdat) > >>>>>> >>> >> >> DW_TAG_partial_unit > >>>>>> >>> >> >> > DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”) > >>>>>> >>> >> >> DW_AT_dwo_id(“0x1234ABCDE”) > >>>>>> >>> >> >> > >>>>>> >>> >> >> DW_TAG_module > >>>>>> >>> >> >> DW_AT_signature(“0x1234ABCDE”) > >>>>>> >>> >> >> DW_AT_name(“Foundation”) > >>>>>> >>> >> > > >>>>>> >>> >> > > >>>>>> >>> >> > The thing you haven't covered is the actual .dwo sections > (.debug_info.dwo (we'll probably need a simple stub compile_unit to make > this correct split-dwarf) and .debug_types.dwo being important - but all > the supporting .dwo sections will be necessary) that go in the module file. > >>>>>> >>> >> > > >>>>>> >>> >> >> This is bending the definition of DW_AT_signature, but I > guess it could be made to work. Or we could say that for now, users have to > choose between the comdat optimization and having the module imports > recorded in Dwarf, since GDB wouldn’t know what to do with that information > anyway. > >>>>>> >>> >> > >>>>>> >>> >> Sorry for the long delay. Here’s a more complete example > that should include all the suggestions made so far. For context I also > included external type references in the example although admittedly this > is a bit out of scope for this thread: > >>>>>> >>> >> > >>>>>> >>> >> ELF (typeunits, comdats, with imports) > >>>>>> >>> >> -------------------------------------- > >>>>>> >>> >> > >>>>>> >>> >> On ELF or COFF a bar.c referencing type Foo from the module > FooLib looks like this: > >>>>>> >>> >> > >>>>>> >>> >> bar.o > >>>>>> >>> >> ~~~~~ > >>>>>> >>> >> > >>>>>> >>> >> // To keep this example focussed/readable, I'm assuming that > bar.o itself was not compiled with fission. > >>>>>> >>> >> .debug_info: > >>>>>> >>> >> DW_TAG_compile_unit > >>>>>> >>> >> DW_AT_name(“bar.c”) > >>>>>> >>> >> ... > >>>>>> >>> >> > >>>>>> >>> >> DW_TAG_imported_module // <- This could be optional on > ELF. > >>>>>> >>> >> DW_AT_import [DW_FORM_ref_sig8] (0xABCD1234) > >>>>>> >>> >> > >>>>>> >>> >> DW_TAG_variable > >>>>>> >>> >> DW_AT_name(“MyFoo”) > >>>>>> >>> >> DW_AT_type [DW_FORM_ref4] 0x20 > >>>>>> >>> >> 0x20: > >>>>>> >>> >> DW_TAG_structure_type > >>>>>> >>> >> DW_AT_declaration (true) > >>>>>> >>> >> DW_AT_signature [DW_FORM_ref_sig8] (0xF00) > >>>>>> >>> >> > >>>>>> >>> >> > >>>>>> >>> >> // Split DWARF skeleton CU for the module Foo. > >>>>>> >>> >> DW_TAG_compile_unit > >>>>>> >>> >> > DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”) > >>>>>> >>> >> DW_AT_dwo_id(“0xFEDB9876”) > >>>>>> >>> >> ... > >>>>>> >>> >> > >>>>>> >>> >> // Comdat’d partial unit containing the optional module > descriptor. > >>>>>> >>> >> .debug_info, group 0xABCD1234, comdat > >>>>>> >>> >> DW_TAG_partial_unit > >>>>>> >>> >> DW_TAG_module > >>>>>> >>> >> DW_AT_name(“FooLib”) > >>>>>> >>> >> DW_AT_LLVM_sysroot(“/“) > >>>>>> >>> >> DW_AT_LLVM_include_dirs(“-I/path”) > >>>>>> >>> >> DW_AT_LLVM_macros(“-DNDEBUG”) > >>>>>> >>> >> ... > >>>>>> >>> >> > >>>>>> >>> >> FooLib-XYZ.pcm > >>>>>> >>> >> ~~~~~~~~~~~~~~ > >>>>>> >>> >> > >>>>>> >>> >> .debug_info.dwo > >>>>>> >>> >> DW_TAG_compile_unit > >>>>>> >>> >> DW_AT_dwo_id(“0xFEDB9876”) > >>>>>> >>> >> ... > >>>>>> >>> >> > >>>>>> >>> >> // Type unit for the type Foo. > >>>>>> >>> >> .debug_types.dwo, group 0xF00, comdat > >>>>>> >>> >> DW_TAG_type_unit > >>>>>> >>> >> DW_TAG_structure_type > >>>>>> >>> >> DW_AT_name (“Foo”) > >>>>>> >>> >> ... > >>>>>> >>> >> > >>>>>> >>> >> > >>>>>> >>> >> I think it awkward to have both the skeleton compile_unit in > .debug_info and the partial_unit containing the TAG_module. Personally I’d > prefer putting the TAG_module into the skeleton CU and then just refer to > it via a FORM_ref_addr; but if we want to put the TAG_module into a comdat > section, it looks like that’s what’s necessary. > >>>>>> >>> > > >>>>>> >>> > It's been a while & I've probably lost all the context, but I > think my original theory was to have the skeleton compile_unit be comdat'd > so they'd deduplicate on linking (so we'd only have one reference to the > module.dwo in the linked binary). I don't recall there being a need for a > separate partial_unit - I imagine we'd just put the LLDB/LLVM extension > attributes on the skeleton compile_unit and expect debuggers that didn't > understand them, to ignore them. > >>>>>> >>> > > >>>>>> >>> > Was there some reason this didn't work/make sense? Because > you need a DW_TAG_module to import with DW_TAG_imported_module? > >>>>>> >>> Using DW_TAG_module was the best practice that was recommended > on dwarf-discuss. > >>>>>> >>> > >>>>>> >>> Did they have any ideas on how to reference it without > duplicating it in every CU? > >>>>>> >> > >>>>>> >> We didn’t touch the deduplication issue. > >>>>>> >> > >>>>>> >>> Once we've got the "Bag O Dwarf" stuff (rather than the > narrower type units) this would be easier - (I suppose we could do a > partial solution/abuse of type units - use a type unit header (perhaps with > Eric's merged type/compile unit work) and a DW_FORM_ref_sig8 value for the > DW_AT_module in the DW_TAG_imported_module. > >>>>>> >>> > >>>>>> >>> Though I suppose if we're going to have DW_TAG_imported_module > in every CU that references a module, it might not be that big of a deal to > include the DW_TAG_module itself there too... while I don't care about this > scheme immediately, Google's growing LLDB investment in various platforms, > so I am vaguely concerned about getting this right & it's not immediately > obvious to me what that right answer is. > >>>>>> >> > >>>>>> >> Maybe the best path forward is to stage this by initially > putting the DW_TAG_module into the main CU and leave the deduplication as > an optimization to be implemented once the bag’o dwarf is more fleshed out. > This way we won’t do anything that would confuse consumers (assuming they > ignore unknown tags) and the extra overhead is likely not even going to be > noticeable, since all the string attributes inside the TAG_module can > already be deduplicated by traditional means. > >>>>>> > > >>>>>> > Perhaps. I'd still like to think through/document what this looks > like a bit more. Where the data ends up, what it's used for, etc. Sorry to > draw this out. > >>>>>> > > >>>>>> > :/ *ponders* > >>>>>> > >>>>>> > >>>>>> Let’s construct this: > >>>>>> > >>>>>> The most straightforward representation is to not unique the > TAG_module and place it into the main CU. > >>>>>> > >>>>>> bar.o > >>>>>> ~~~~~ > >>>>>> > >>>>>> .debug_info: > >>>>>> DW_TAG_compile_unit > >>>>>> ... > >>>>>> DW_TAG_imported_module > >>>>>> DW_AT_import [DW_FORM_ref4] (0x20) > >>>>>> 0x20: > >>>>>> DW_TAG_module > >>>>>> DW_AT_name(“FooLib”) > >>>>>> DW_AT_LLVM_sysroot(“/“) > >>>>>> DW_AT_LLVM_include_dirs(“-I/path”) > >>>>>> DW_AT_LLVM_macros(“-DNDEBUG”) > >>>>>> > >>>>>> Might as well put all these LLVM attributes on the skeleton CU, > though - so they can be deduplicated (& just put the dwo_id in this module > somewhere, perhaps just using the DW_AT_dwo_id attribute - possibly that's > the only attribute the DW_TAG_module would need, ideally). Unless we need > to consider the submodule issue (in which case the skeleton unit would > reference the whole module but the submodules would reference/describe the > respective submodules?)? > >>>>> > >>>>> We cannot put them into the skeleton CU if the skeleton CU is going > to be comdat’d, because we’d then have to refer to it via a signature and > that leads us directly to the can of worms discussed in the next paragraph > :-) > >>>>>> > >>>>>> ... > >>>>>> > >>>>>> // Split DWARF skeleton, comdat'd. > >>>>>> .debug_info, group 0xFEDB9876, comdat > >>>>>> DW_TAG_compile_unit > >>>>>> > DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”) > >>>>>> DW_AT_dwo_id(“0xFEDB9876”) > >>>>>> ... > >>>>>> > >>>>>> On Mach-O the split DWARF skeleton would not be a comdat’d, but > llvm-dsymutil can just ignore it. > >>>>>> > >>>>>> > >>>>>> If we want to dedup the TAG_module we need to refer to it via > signature. This means we need to wrap it in a type_unit or a DWARF5 > TAG_type_unit. We might as well throw it in with the skeleton CU. > >>>>>> > >>>>>> .debug_info: > >>>>>> DW_TAG_compile_unit > >>>>>> ... > >>>>>> DW_TAG_imported_module > >>>>>> DW_AT_import [DW_FORM_ref_sig8] (0xABCD1234) > >>>>>> > >>>>>> // Split DWARF skeleton, comdat'd. > >>>>>> .debug_info, group 0xFEDB9876, comdat > >>>>>> DW_TAG_compile_unit > >>>>>> > DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”) > >>>>>> DW_AT_dwo_id(“0xFEDB9876”) > >>>>>> ... > >>>>>> DW_TAG_type_unit (signature: 0xABCD1234) > >>>>>> > >>>>>> Can't really put a type_unit inside a compile_unit - it'd need to > be top-level with an appropriate type unit header, etc. & then we'd need > two different units/headers, could still comdat them, but it's a weird > abuse of type units & would probably confuse consumers. I don't know > whether that's worth the effort. > >>>>> Oh right. > >>>>> > >>>>>> > >>>>>> DW_TAG_module > >>>>>> DW_AT_name(“FooLib”) > >>>>>> DW_AT_LLVM_sysroot(“/“) > >>>>>> DW_AT_LLVM_include_dirs(“-I/path”) > >>>>>> DW_AT_LLVM_macros(“-DNDEBUG”) > >>>>>> ... > >>>>>> > >>>>>> Now that raises the question about what happens with multiple > modules within one PCM. > >>>>>> > >>>>>> Is the right term "submodule"? it's sort of confusing to talk about > multiple modules within a pcm. > >>>>> > >>>>> Yes, a module with nested submodules. > >>>>> http://clang.llvm.org/docs/Modules.html#submodule-declaration > >>>>> > >>>>>> > >>>>>> Assuming that the ELF linker is linking and deduping all the > non-.dwo sections, we may loose some of the TAG_modules (if not every CU > imports all submodules) in the binary, but that wouldn’t matter because the > consumer would find all TAG_modules by signature in the .pcm > >>>>>> > >>>>>> Is there any reason we need to reference the submodules > individually, rather than just reference the whole module > >>>>> > >>>>> My assumption is that an AST-aware debugger will want to import the > exact submodules that were imported by the CU before dropping into the > expression evaluator to replicate the environment of the CU as much as > possible. > >>>>> > >>>>> I'm just not picturing that. It seems pretty likely that a debugger > user is more likely to treat the whole set of names in the program, not > just those syntactically valid at that point in the source file. > >>>> > >>>> Module imports only work if the debugger has the precise list of > models imported by the current CU. Clang modules are not namespaces, and > any two modules may conflict. > >>> > >>> Right, as you say - ODR & C languages. (& I've no idea if file-scoped > static/anonymous namespace things can go in C++ modules and what happens if > you have conflicting modules in that regard - I guess they can conflict > too? Dunno - maybe anon namespaces in C++ modules aren't allowed) > >> > >> It sounds like a strange concept to put an anonymous namespace into a > public module, but then again there exists > clang/test/Modules/anon-namespace.cpp (it only uses an empty anonymous > namespace, though). I’m not sure how this is meant to be used. > >> > >>>> > >>>> The cool thing is that with the imported modules the debugger > effectively becomes clang and have the entire world visible to the current > CU available, including any types and functions that never made it into the > debug info because they were optimized out, or because there were > uninstantiated templates that cannot be represented by DWARF. > >>>> > >>>>> A simple example would be if I'm debugging LLVM and I'm in some > generic optimization pass, but I want to cast my Instruction pointer to > some specific instruction type to examine it in more detail - even though > this pass doesn't care about that specific Instruction type nor include the > header in which it's declared. > >>>> > >>>> If, however, the type lookup fails, the debugger can still fall back > to the traditional behavior, find the type in the accelerator tables and > reconstruct it from DWARF (if it is there). > >>> > >>> So you're going to need to implement fission (to at least some degree) > support in LLDB, then? (to support the case where you haven't linked debug > info with llvm-dsymutil, but you've hit one of these lookup problems where > you need to cross possibly-conflicting modules) > >> > >> Yes. Specifically, it won’t support type units, and it will look up > types by name rather than by signature. (cf. the second part of > http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html > ) > > > > How are you going to reference the types in the module's fission CU > without type units/signatures? Are you going to emit type declarations into > the normal CU and rely on the debugger to know that these declarations can > be resolved by looking elsewhere? (just without the benefit of constraining > that search to just looking for a matching TU?) > > If you look at the example in > http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html, > there will be an external type index (using the usual accelerator table > format) that maps an external type’s UID to a pcm. In the pcm there is an > extra accelerator table entry that maps UID to DIE offset. > I mean I guess that's up to you, but seems like a relatively large workaround compared to supporting type units... (I mean certainly seems like strictly less work to do the workaround than implementing type units in LLDB, but a relatively large amount of work to do/throw away eventually once LLDB supports type units) > > >> > >> > >>> > >>> OK, so I think it's probably reasonable for now to just add > DW_TAG_modules to the CU for each referenced module (or does it have to be > each referenced submodule? (can two submodules within a single module be > contradictory/conflicting?)). Since we don't have any good way to reference > the module is a foreign unit while deduplicating that unit... there's not > much point having the imported_module - but if you think it adds anything, > I'm open to ideas. > >> It could help keeping things simpler. > >> Emitting it doesn’t add much semantic value because module imports > always occur at the top level, but it will make the transition to the > deduplicated TAG_modules easier — It could be easier to teach consumers > once about imported_module({ref to TAG_module}) rather than having them > also recognize top-level TAG_modules as an intermediate step. It’s also > slightly easier to implement in LLVM because the imported_module allows us > to anchor the TAG_module in the CU, but that’s not a very strong argument. > > > > Agreed on all counts (not a strong argument, but convenient enough, etc, > etc). > > > > I'm still not entirely sure what the right answer is here, though, which > is why I'm hesitant to bake anything in too strongly. > > > > To come back to one of the outstanding questions: Do you need submodule > import information, or just module level (if modules cannot have internal > conflicts and you can't avoid cross-module conflicts just by lack of > visibility (I have no idea if either of those things are true) then you may > just need per-module not per-submodule info)? > > At the moment I do not think that it makes sense for two submodules to > conflict, but there is nothing in the clang documentation that explicitly > forbids this. With this in mind, I think it is reasonable to not support > submodules (at least initially) and always emit an import for the parent > module. > Thats what I wanted to write ... but I as I’m browsing through our > documentation, > http://clang.llvm.org/docs/Modules.html#conflict-declarations explicitly > gives an example of two conflicting submodules, so maybe this is not a > reasonable simplification after all. On the other hand, a quick grep over > all system module maps on OS X doesn’t show a single conflict declaration. > > I still believe we do not need to support submodules right from the start, > but we should have a story for getting there if we need to. > Given the simple example that demonstrates the possibility, it seems fair to have a story for what that looks like, yes - even if a first pass/prototype doesn't support it. > > > > > Also, does each submodule need different special attributes/flags? If > the special codegen attributes you want are at the module level, it'd > probably be best to keep those on the Skeleton CU for the module (that will > be comdat folded, etc, on ELF - and they could be DWARF-aware deduplicated > by llvm-dsymutil) so they're not duplicated. The DW_TAG_module would then > just have a DW_AT_signature attribute or something similarly small/trivial > to point to the skeleton CU. > > The attributes are derived from cc1 command line arguments. Not two > submodules imported by one CU can have different attributes. All submodules > in a pcm also share their attributes. Putting them into the skeleton CU > appears to be the most efficient place to put them, though perhaps not the > most logical one. > Why not the most logical? It'd be nice if it were a DW_TAG_module instead of a DW_TAG_compile_unit - but given the limited vocabulary we have in DWARF top level tags, it seems as good as we can have. > I would prefer to stick the attributes on the (top-level) DW_TAG_module > and later deduplicate the attributes together with the DW_TAG_module. > Sticking them on the skeleton won’t save any space in the .o files and > would save 3*4-8=4 bytes (3x FORM_strp for include, macro, and isysroot - > 1x FORM_ref_sig_8) per CU and imported module. Seems nicer not to duplicate them, especially since not everyone will be using a debug-aware linker like llvm-dsymutil (LLDB on Windows or Linux won't have that convenience). Eventually we can use Bag O' DWARF for the skeleton CU, make it a DW_TAG_module (with more DWARF changes to allow that as a top-level tag, if desired/useful - I'm not sure it adds a lot) and have the imported_module reference it that way. (DW_TAG_imported_module, DW_AT_import, DW_FORM_ref_sig8) I'm not /hugely/ invested in this, but we do have people caring about LLDB on Linux and Windows, so avoiding tying the LLDB story to MachO and dsymutil, etc, seems valuable. > > > If you need submodule import lists, then each DW_AT_module representing a > submodule would have a name (anything else?) and the signature refering to > its module skeleton CU. > > What I’m envisioning is > > .debug_info: > DW_TAG_compile_unit > ... > DW_TAG_imported_module > // import FooSubA > DW_AT_import [DW_FORM_ref4] (0x60) > > DW_TAG_module > DW_AT_name(“FooLib”) > DW_AT_LLVM_sysroot(“/“) > DW_AT_LLVM_include_dirs(“-I/path”) > DW_AT_LLVM_macros(“-DNDEBUG”) > 0x60: > DW_TAG_module > DW_AT_name(“FooSubA”) > // need not be emitted if not referenced. > DW_TAG_module > DW_AT_name(“FooSubASubA”) > > // need not be emitted if not referenced. > DW_TAG_module > DW_AT_name(“FooSubB”) > > > > -- adrian > > > > >> > >>> Maybe later (when we have Bag O' DWARF) we can do that. & only do this > when targeting lldb (on by default on Darwin, off by default elsewhere). > >>> > >>> & LLDB, once it's got the Fission support it'll need for this anyway, > will fallback gracefully if these special modules are omitted. > >> > >> Sounds good to me! > >> > >> -- adrian > >> > >>> > >>> - David > >>> > >>> > >>>> > >>>>> (& have just a single, whole module in the pcm)? > >>>> > >>>> That’s probably not what you meant, but just to be sure: The pcm will > always have the entire module with all submodules in it. But the debugger > may choose to import only a subset of those. > >>>> > >>>>> > >>>>> file referred to by whichever skeleton CU makes it into the binary: > >>>>> > >>>>> FooLib-XYZ.pcm > >>>>> ~~~~~~~~~~~~~~ > >>>>> > >>>>> .debug_info.dwo > >>>>> DW_TAG_compile_unit > >>>>> DW_AT_dwo_id(“0xFEDB9876”) > >>>>> ... > >>>>> > >>>>> DW_TAG_type_unit (signature: 0xABCD1234) > >>>>> DW_TAG_module > >>>>> DW_AT_name(“FooLib”) > >>>>> ... > >>>>> DW_TAG_type_unit (signature: 0xCDEF3456) > >>>>> DW_TAG_module > >>>>> DW_AT_name(“FooLib”) > >>>>> DW_TAG_module > >>>>> DW_AT_name(“SubFoo”) > >>>>> ... > >>>>> > >>>>> So.. this should work as long as nobody points out that a module > isn’t really a type. > >>>>> > >>>>> Yeah, probably worth waiting for "Bag O DWARF". > >>>>> > >>>>> For now, as you mentioned earlier, maybe just putting the > imported_module and the module into the compile_unit when tuning for LLDB > (so Darwin by default, and anywhere else where someone tunes for LLDB in > the future) & leave them out otherwise. > >>>> > >>>> Sounds prefectly reasonable. > >>>>> > >>>>> Could you remind me why LLDB wants to know which modules are > referenced from a CU? (rather than just all the modules used by a program > overall?) > >>>> > >>>> LLDB uses clang for the expression evaluation. Traditionally it would > look up a type in DWARF, build a clang AST out of it and then import it. > With this it could directly import the clang modules and have access to > everything in the module. But, clang modules are not namespaces, so modules > can conflict (and that would probably manifest as a crash in libclang). > >>>> > >>>> What's an example of such a conflict? Is that valid (or is it just in > ODR violations) - as mentioned above, it seems to me that only importing > the things lexically available in this source file isn't what a debugger > user would really want. I certainly think I'd trip over that a lot. > >>> > >>> Keep in mind that Objective-C (and C) do not have an ODR, so it’s not > just “just” :-) > >>> Being able to import modules does not mean that the debugger cannot > still fall back to loading types from DWARF; in fact it will have to do > that for all local types anyway. > >>> > >>> -- adrian > >>> > >>>> > >>>> It therefore needs to know which modules are imported in the current > CU before dropping into the expression evaluator. > >>>> > >>>> - adrian > >>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> On Macho-O, in the absence of comdats, we have: > >>>>> > >>>>> bar.o > >>>>> ~~~~~ > >>>>> > >>>>> .debug_info: > >>>>> DW_TAG_compile_unit > >>>>> ... > >>>>> DW_TAG_imported_module > >>>>> DW_AT_import [DW_FORM_ref4] (0x20) > >>>>> > >>>>> DW_TAG_module // uniqued by dsymutil. > >>>>> DW_AT_name(“FooLib”) > >>>>> DW_AT_LLVM_sysroot(“/“) > >>>>> DW_AT_LLVM_include_dirs(“-I/path”) > >>>>> DW_AT_LLVM_macros(“-DNDEBUG”) > >>>>> ... > >>>>> > >>>>> // Split DWARF skeleton, thrown out by dsymutil. > >>>>> > >>>>> Thrown out? Because it's going to read everything in from the module > and merge it in to a single linked debug info blob, I take it? > >>>>> > >>>>> .debug_info, group 0xFEDB9876, comdat > >>>>> DW_TAG_compile_unit > >>>>> > DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”) > >>>>> DW_AT_dwo_id(“0xFEDB9876”) > >>>>> ... > >>>>> > >>>>> FooLib-XYZ.pcm > >>>>> ~~~~~~~~~~~~~~ > >>>>> > >>>>> .debug_info: > >>>>> DW_TAG_compile_unit > >>>>> DW_AT_dwo_id(“0xFEDB9876”) > >>>>> ... > >>>>> > >>>>> DW_TAG_module > >>>>> DW_AT_name(“FooLib”) > >>>>> DW_TAG_module > >>>>> DW_AT_name(“SubFoo”) > >>>>> ... > >>>>> > >>>>> -- adrian > >>>>> > >>>>> > > >>>>> >> > >>>>> >>> > >>>>> >>> > If it turns out that's the right way to get a target for the > imported_module, we could put both the skeleton CU and the partial unit in > the same comdat and dedup them both together. > >>>>> >>> > >>>>> >>> I think this works as long as we only have one TAG_module per > .pcm file (because we need to refer to it via signature). > >>>>> >>> > >>>>> >>> Not quite following here - why would we have more than one > module per pcm - a pcm is a module, right? > >>>>> >> > >>>>> >> Clang modules may have submodules and a compile unit could import > two submodules that live in the same .pcm file. For example on Darwin there > is a module Darwin.pcm that contains a submodule “C" that contains the > submodule “stdio". > >>>>> > > >>>>> > OK, so this bit's relevant to your use case in LLDB of loading the > right things for the right context, but not relevant to the context-less > debuggers like GDB that will just treat everything as one big namespace > (except for file-local things, etc). So it's important for your imported > modules but not for the basic Fission style debug reference. > >>>>> > > >>>>> > Well, maybe - I'm not sure what you're picturing in terms of the > DWARF in the module for submodules? If you want that granularity we'll have > to talk about how to split the DWARF in the module into chunks per > submodule? > >>>>> > > >>>>> >> > >>>>> >>> > >>>>> >>> But if we don’t mind having duplicate dwo_* references in the > same .o file this would also work with more than one TAG_module (or > submodules). > >>>>> >>> > >>>>> >>> > >>>>> >>> .debug_info: > >>>>> >>> DW_TAG_compile_unit > >>>>> >>> DW_AT_name(“bar.c”) > >>>>> >>> ... > >>>>> >>> > >>>>> >>> DW_TAG_imported_module // <- This could be optional on ELF. > >>>>> >>> DW_AT_import [DW_FORM_ref_sig8] (0xFEDB9876) > >>>>> >>> > >>>>> >>> ... > >>>>> >>> > >>>>> >>> // Comdat’d split DWARF skeleton CU for the module Foo. > >>>>> >>> .debug_info, group 0xFEDB9876, comdat > >>>>> >>> DW_TAG_compile_unit > >>>>> >>> > DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”) > >>>>> >>> DW_AT_dwo_id(“0xFEDB9876”) > >>>>> >>> ... > >>>>> >>> > >>>>> >>> DW_TAG_module > >>>>> >>> DW_AT_name(“FooLib”) > >>>>> >>> DW_AT_LLVM_sysroot(“/“) > >>>>> >>> DW_AT_LLVM_include_dirs(“-I/path”) > >>>>> >>> DW_AT_LLVM_macros(“-DNDEBUG”) > >>>>> >>> ... > >>>>> >>> > >>>>> >>> > >>>>> >>> > > >>>>> >>> > But this gets into complicated territory when the original > binary is built with fission... which will be relevant for modules on ELF > with LLDB. Hmm, maybe it's not too complicated - the partial_unit would end > up in the .dwo file (maybe we'd have to teach the .dwo file to deduplicate > these too - the same way it does for type units... - might require a new > header to include the hash, etc :/)... would be tricky to have the dwp tool > resolve the relocations to these things. Cross-unit references as you've > got there aren't something that every DWARF consumer is totally cool with, > I don't think? > >>>>> >>> > >>>>> >>> Ah. I thought the deduplication happens because all ELF sections > sharing the same group are uniqued based on the group id. > >>>>> >>> > >>>>> >>> COMDAT groups deduplicate for a normal non-fission build, but > fission linking doesn't require the .dwo file to use/contain COMDATs as it > uses a DWARF-aware tool (so you don't bother putting the type units in > COMDAT groups, for example - the fission linker knows how to parse > debug_types, find the type unit headers and their hashes and deduplicates > them that way). > >>>>> >> > >>>>> >> Ok that makes sense. > >>>>> >> > >>>>> >> -- adrian > >>>>> >> > >>>>> >>> > >>>>> >>> It certainly would be nice if we could avoid introducing a new > .debug_info header... > >>>>> >>> > >>>>> >>> > > >>>>> >>> > Sort of inclined to have the imported module stuff just for > LLDB, but I've lost some of the context for that in the ensuing weeks. > >>>>> >>> > >>>>> >>> -- adrian > >>>>> >>> > >>>>> >>> > > >>>>> >>> >> > >>>>> >>> >> > >>>>> >>> >> > >>>>> >>> >> > >>>>> >>> >> MachO (no typeunits, no comdats, with imports) > >>>>> >>> >> ---------------------------------------------- > >>>>> >>> >> > >>>>> >>> >> Since we don’t have comdat sections in Mach-O and we don’t > have the tool support for type units, the way that external types can be > referenced necessarily needs to be a bit different. The design that Greg > and I came up with for Mach-O relies on llvm-dsymutil to fix up the DWARF > for non-module-aware consumers. Just as ELF DWARF consumers need not be > able to tell the difference between module debugging an split DWARF, on > Mach-O the .dSYM bundle generated by llvm-dsymutil looks like traditional > DWARF. > >>>>> >>> >> > >>>>> >>> >> There are three differences in the DWARF output that make > this possible: > >>>>> >>> >> - Refer to external types by UID rather than by type > signature. > >>>>> >>> >> (This doubles as the key that allows a debugger to look > import the type > >>>>> >>> >> directly from the AST and protects us against hash > collisions) > >>>>> >>> >> - Add an index to the .o file that maps UID -> module file. > >>>>> >>> >> (Fast lookup + UIDs for C and ObjC are only unique within > a module) > >>>>> >>> >> - Add an entry for each type’s UID to the types accelerator > table. > >>>>> >>> >> (Fast lookup) > >>>>> >>> >> > >>>>> >>> >> bar.o > >>>>> >>> >> ~~~~~ > >>>>> >>> >> > >>>>> >>> >> .debug_info: > >>>>> >>> >> DW_TAG_compile_unit > >>>>> >>> >> DW_AT_name(“bar.c”) > >>>>> >>> >> DW_TAG_imported_module > >>>>> >>> >> DW_AT_import(DW_FORM_ref_addr 0x40) > >>>>> >>> >> > >>>>> >>> >> DW_TAG_variable > >>>>> >>> >> DW_AT_name(“MyFoo”) > >>>>> >>> >> DW_AT_type [DW_FORM_strp] (“_ZTS3Foo”) // We could use > a custom FORM here > >>>>> >>> >> > >>>>> >>> >> // Skeleton unit. > >>>>> >>> >> DW_TAG_compile_unit > >>>>> >>> >> > DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”) > >>>>> >>> >> DW_AT_dwo_id(“0xFEDB9876”) > >>>>> >>> >> ... > >>>>> >>> >> 0x40: > >>>>> >>> >> DW_TAG_module > >>>>> >>> >> DW_AT_name(“FooLib”) > >>>>> >>> >> DW_AT_LLVM_sysroot(“/“) > >>>>> >>> >> DW_AT_LLVM_include_dirs(“-I/path”) > >>>>> >>> >> DW_AT_LLVM_macros(“-DNDEBUG”) > >>>>> >>> >> > >>>>> >>> >> // This index uses the usual accelerator table format. > >>>>> >>> >> .apple_exttypes: > >>>>> >>> >> { “_ZTS3Foo” => debug_str offset of > ”/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm” } > >>>>> >>> >> > >>>>> >>> >> FooLib-XYZ.pcm > >>>>> >>> >> ~~~~~~~~~~~~~~ > >>>>> >>> >> > >>>>> >>> >> .debug_info > >>>>> >>> >> DW_TAG_compile_unit > >>>>> >>> >> DW_AT_dwo_id(“0xFEDB9876”) > >>>>> >>> >> > >>>>> >>> >> 0x80: > >>>>> >>> >> DW_TAG_structure_type > >>>>> >>> >> DW_AT_name (“Foo”) > >>>>> >>> >> DW_AT_signature > >>>>> >>> >> ... > >>>>> >>> >> > >>>>> >>> >> // In addition to the entry for “Foo”, there is also an entry > for the type’s UID “_ZTS3Foo” pointing to the type definition DIE. > >>>>> >>> >> .apple_types > >>>>> >>> >> { “Foo” => 0x80 } > >>>>> >>> >> { “_ZTS3Foo” => 0x80 } > >>>>> >>> >> > >>>>> >>> >> > >>>>> >>> >> > >>>>> >>> >> When the debug info linker (llvm-dsymutil) is run, it first > pulls in the .debug_info section from the clang module and fixes up all the > DW_FORM_strp external type references by turning them into a > DW_FORM_ref_addr that references the type in the DW_TAG_compile_unit pulled > in from the module. To find the correct type DIE it looks up the UID in the > .apple_exttypes index, finds the module, looks up the UID in the regular > .apple_types accelerator table and replaces the temporary DW_FROM_strp with > a DW_FORM_ref_addr (which incidentally takes up the same amount of space in > the DIE). > >>>>> >>> >> > >>>>> >>> >> > >>>>> >>> >> Thoughts? > >>>>> >>> >> -- > >>>>> >>> >> adrian > >>>>> >>> >> > >>>>> >>> > > >>>>> >>> > >>>>> >> > >>>>> >> > >>>>> > > >>> > >>> > >> > >> > > >
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
