On Fri, May 1, 2015 at 8:52 PM, Adrian Prantl <[email protected]> wrote:
> > On May 1, 2015, at 5:25 PM, David Blaikie <[email protected]> wrote: > > > > On Fri, May 1, 2015 at 5:19 PM, Adrian Prantl <[email protected]> wrote: > >> >> On May 1, 2015, at 4:55 PM, David Blaikie <[email protected]> wrote: >> >> >> >> On Fri, May 1, 2015 at 4:39 PM, Adrian Prantl <[email protected]> wrote: >> >>> >>> > On May 1, 2015, at 10:01 AM, David Blaikie <[email protected]> wrote: >>> > >>> > >>> > >>> > On Fri, May 1, 2015 at 9:52 AM, Adrian Prantl <[email protected]> >>> wrote: >>> >> >>> >>> On May 1, 2015, at 9:23 AM, David Blaikie <[email protected]> >>> wrote: >>> >>> >>> >>> >>> >>> >>> >>> On Thu, Apr 30, 2015 at 5:21 PM, Adrian Prantl <[email protected]> >>> wrote: >>> >>> >>> >>> > On Apr 30, 2015, at 4:55 PM, David Blaikie <[email protected]> >>> wrote: >>> >>> > >>> >>> > >>> >>> > >>> >>> > On Thu, Apr 30, 2015 at 4:31 PM, Adrian Prantl <[email protected]> >>> wrote: >>> >>> >> >>> >>> >> > On Mar 19, 2015, at 5:37 PM, David Blaikie <[email protected]> >>> wrote: >>> >>> >> > >>> >>> >> > >>> >>> >> > >>> >>> >> > On Thu, Mar 19, 2015 at 5:24 PM, Adrian Prantl < >>> [email protected]> wrote: >>> >>> >> >> >>> >>> >> >> > On Mar 16, 2015, at 2:55 PM, David Blaikie < >>> [email protected]> wrote: >>> >>> >> >> > >>> >>> >> >> > >>> >>> >> >> > >>> >>> >> >> >> On Mon, Mar 16, 2015 at 2:45 PM, Robinson, Paul < >>> [email protected]> wrote: >>> >>> >> >> > Beyond the above (that using a new tag would mean this would >>> go from 'free' to 'not free' for GDB) having a new top level tag is pretty >>> substantial (we only have two at the moment, and with our talk of modules >>> being a "bag of dwarf" might go back to having one top level tag? (it's not >>> clear to me from DWARF4 whether DW_TAG_module is currently a top-level tag, >>> I don't think it is?) >>> >>> >> >> > >>> >>> >> >> >> The .debug_info section contains one or more compilation >>> units, partial units, or in DWARF 5, type units. DW_TAG_module isn't a >>> unit, if you want it to be handled independently then it would need to be >>> wrapped in a DW_TAG_partial_unit. You would probably then use >>> DW_TAG_imported_unit to refer to it, rather than DW_TAG_imported_module. >>> >>> >> >> >> >>> >>> >> >> > >>> >>> >> >> > This makes a fair bit of sense - though the terminology's >>> never going to quite line up with modules, I suspect, and this would still >>> require modifying existing consumers (well, GDB) that can handle >>> split-dwarf today, I suspect (not sure how it'd handle partial_unit - maybe >>> that does work? - and still don't know how existing consumers would handle >>> imported_unit either - could be worth some testing, as it sounds sort of >>> right out of several less right options). >>> >>> >> >> >>> >>> >> >> Thanks for all the input so far! >>> >>> >> >> To concretize this end of the discussion up let’s sketch some >>> dwarf of how this could look like in practice. >>> >>> >> >> >>> >>> >> >> ELF (no imports) >>> >>> >> >> ---------------- >>> >>> >> >> >>> >>> >> >> On ELF or COFF a foo.c referencing types from the module >>> Foundation looks like this: >>> >>> >> >> >>> >>> >> >> .debug_info: >>> >>> >> >> DW_TAG_compile_unit >>> >>> >> >> DW_AT_name(“foo.c”) >>> >>> >> >> >>> >>> >> >> .debug_info.dwo (on ELF: group 0x1234ABCDE, comdat) >>> >>> >> >> DW_TAG_partial_unit >>> >>> >> > >>> >>> >> > For now I'd suggest we use compile_unit - that way it'll just >>> work with existing split-dwarf consumers. We can see about standardizing a >>> top-level DW_TAG_module or using DW_TAG_partial_unit here later, perhaps? >>> I'm not sure. >>> >>> >> > >>> >>> >> >> >>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”) >>> >>> >> >> DW_AT_dwo_id(“0x1234ABCDE”) >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> Side question: Is .debug_info.dwo the right section to put the >>> module skeleton in, or should it be a .debug_info section like normal >>> fission skeletons? >>> >>> >> > >>> >>> >> > Skeletons go in .debug_info, the dwo sections are just for the >>> .dwo file (or the module file, in our new case - the extension isn't >>> actually important). >>> >>> >> > >>> >>> >> > It might be worth you compiling an example or two of >>> split-dwarf to see how this all works hands-on. >>> >>> >> > >>> >>> >> >> Mach-O (no comdat, no imports) >>> >>> >> >> ------------------------------ >>> >>> >> >> >>> >>> >> >> Mach-O doesn’t do comdat, so with -split-dwarf=Disable (not >>> sure if that option is the best discriminator) this could look like: >>> >>> >> >> >>> >>> >> >> .debug_info: >>> >>> >> >> DW_TAG_compile_unit >>> >>> >> >> DW_AT_name(“foo.c”) >>> >>> >> >> DW_TAG_partial_unit >>> >>> >> >> >>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”) >>> >>> >> >> DW_AT_dwo_id(“0x1234ABCDE”) >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> Mach-O (no comdat, with imports) >>> >>> >> >> ------------------------------ >>> >>> >> >> >>> >>> >> >> If we add the module import information to this, we get: >>> >>> >> >> >>> >>> >> >> .debug_info: >>> >>> >> >> DW_TAG_compile_unit >>> >>> >> >> DW_AT_name(“foo.c”) >>> >>> >> >> DW_TAG_imported_module >>> >>> >> >> DW_AT_import(DW_FORM_ref_addr 0x10) >>> >>> >> > >>> >>> >> > Since we got went down the tangent of explaining split-dwarf >>> many emails ago, I've forgotten (& can't readily find) what we were >>> discussing about what ways the imported_module could work. >>> >>> >> > >>> >>> >> > The simplest representation I can think of would be to have it >>> reference, by signature, the module unit (whatever tag it uses) - >>> DW_FORM_ref_sig8, seems the simplest thing to do. >>> >>> >> > >>> >>> >> >> >>> >>> >> >> DW_TAG_partial_unit >>> >>> >> >> >>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”) >>> >>> >> >> DW_AT_dwo_id(“0x1234ABCDE”) >>> >>> >> >> >>> >>> >> >> 0x10: >>> >>> >> > >>> >>> >> > This is inside the partial unit? I figured we'd just put these >>> attributes on the top level (compile_unit, or whatever it might be later) - >>> potentially conditionalized on platform, sure. >>> >>> >> > >>> >>> >> >> DW_TAG_module >>> >>> >> >> DW_AT_name(“Foundation”) >>> >>> >> >> DW_AT_LLVM_sysroot(“/“) >>> >>> >> >> DW_AT_LLVM_include_dir(“”) >>> >>> >> >> DW_AT_LLVM_macros(“-DNDEBUG”) >>> >>> >> >> ... >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> ELF (comdat, with imports) >>> >>> >> >> -------------------------- >>> >>> >> >> >>> >>> >> >> But now let’s go back to ELF. Since the skeleton with the >>> partial unit is comdat'd, I assume that this breaks the FORM_ref_addr used >>> in the DW_AT_import. We could reuse the module hash as a signature for the >>> module: >>> >>> >> >> >>> >>> >> >> .debug_info: >>> >>> >> >> DW_TAG_compile_unit >>> >>> >> >> DW_AT_name(“foo.c”) >>> >>> >> >> DW_TAG_imported_module >>> >>> >> >> DW_AT_import(DW_FORM_ref_addr 0x1234ABCDE) >>> >>> >> > >>> >>> >> > Still only really need these imported_modules for lldb, right? >>> I'd consider having them off-by-default for non-darwin, but I'm not >>> strictly wedded to that notion. Wouldn't mind seeing size impact numbers of >>> some kind - if it's really fractional % increase & GDB doesn't fall over >>> when it sees them (in whatever FORM/tag/etc we decide on) then that's not >>> the end of the world. >>> >>> >> > >>> >>> >> > Just seems nice if the default mode is the nice, standard, >>> split-dwarf output. Doesn't need anything fancy. >>> >>> >> > >>> >>> >> > >>> >>> >> >> .debug_info.dwo (group 0x1234ABCDE, comdat) >>> >>> >> >> DW_TAG_partial_unit >>> >>> >> >> >>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”) >>> >>> >> >> DW_AT_dwo_id(“0x1234ABCDE”) >>> >>> >> >> >>> >>> >> >> DW_TAG_module >>> >>> >> >> DW_AT_signature(“0x1234ABCDE”) >>> >>> >> >> DW_AT_name(“Foundation”) >>> >>> >> > >>> >>> >> > >>> >>> >> > The thing you haven't covered is the actual .dwo sections >>> (.debug_info.dwo (we'll probably need a simple stub compile_unit to make >>> this correct split-dwarf) and .debug_types.dwo being important - but all >>> the supporting .dwo sections will be necessary) that go in the module file. >>> >>> >> > >>> >>> >> >> This is bending the definition of DW_AT_signature, but I guess >>> it could be made to work. Or we could say that for now, users have to >>> choose between the comdat optimization and having the module imports >>> recorded in Dwarf, since GDB wouldn’t know what to do with that information >>> anyway. >>> >>> >> >>> >>> >> Sorry for the long delay. Here’s a more complete example that >>> should include all the suggestions made so far. For context I also included >>> external type references in the example although admittedly this is a bit >>> out of scope for this thread: >>> >>> >> >>> >>> >> ELF (typeunits, comdats, with imports) >>> >>> >> -------------------------------------- >>> >>> >> >>> >>> >> On ELF or COFF a bar.c referencing type Foo from the module >>> FooLib looks like this: >>> >>> >> >>> >>> >> bar.o >>> >>> >> ~~~~~ >>> >>> >> >>> >>> >> // To keep this example focussed/readable, I'm assuming that >>> bar.o itself was not compiled with fission. >>> >>> >> .debug_info: >>> >>> >> DW_TAG_compile_unit >>> >>> >> DW_AT_name(“bar.c”) >>> >>> >> ... >>> >>> >> >>> >>> >> DW_TAG_imported_module // <- This could be optional on ELF. >>> >>> >> DW_AT_import [DW_FORM_ref_sig8] (0xABCD1234) >>> >>> >> >>> >>> >> DW_TAG_variable >>> >>> >> DW_AT_name(“MyFoo”) >>> >>> >> DW_AT_type [DW_FORM_ref4] 0x20 >>> >>> >> 0x20: >>> >>> >> DW_TAG_structure_type >>> >>> >> DW_AT_declaration (true) >>> >>> >> DW_AT_signature [DW_FORM_ref_sig8] (0xF00) >>> >>> >> >>> >>> >> >>> >>> >> // Split DWARF skeleton CU for the module Foo. >>> >>> >> DW_TAG_compile_unit >>> >>> >> >>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”) >>> >>> >> DW_AT_dwo_id(“0xFEDB9876”) >>> >>> >> ... >>> >>> >> >>> >>> >> // Comdat’d partial unit containing the optional module >>> descriptor. >>> >>> >> .debug_info, group 0xABCD1234, comdat >>> >>> >> DW_TAG_partial_unit >>> >>> >> DW_TAG_module >>> >>> >> DW_AT_name(“FooLib”) >>> >>> >> DW_AT_LLVM_sysroot(“/“) >>> >>> >> DW_AT_LLVM_include_dirs(“-I/path”) >>> >>> >> DW_AT_LLVM_macros(“-DNDEBUG”) >>> >>> >> ... >>> >>> >> >>> >>> >> FooLib-XYZ.pcm >>> >>> >> ~~~~~~~~~~~~~~ >>> >>> >> >>> >>> >> .debug_info.dwo >>> >>> >> DW_TAG_compile_unit >>> >>> >> DW_AT_dwo_id(“0xFEDB9876”) >>> >>> >> ... >>> >>> >> >>> >>> >> // Type unit for the type Foo. >>> >>> >> .debug_types.dwo, group 0xF00, comdat >>> >>> >> DW_TAG_type_unit >>> >>> >> DW_TAG_structure_type >>> >>> >> DW_AT_name (“Foo”) >>> >>> >> ... >>> >>> >> >>> >>> >> >>> >>> >> I think it awkward to have both the skeleton compile_unit in >>> .debug_info and the partial_unit containing the TAG_module. Personally I’d >>> prefer putting the TAG_module into the skeleton CU and then just refer to >>> it via a FORM_ref_addr; but if we want to put the TAG_module into a comdat >>> section, it looks like that’s what’s necessary. >>> >>> > >>> >>> > It's been a while & I've probably lost all the context, but I >>> think my original theory was to have the skeleton compile_unit be comdat'd >>> so they'd deduplicate on linking (so we'd only have one reference to the >>> module.dwo in the linked binary). I don't recall there being a need for a >>> separate partial_unit - I imagine we'd just put the LLDB/LLVM extension >>> attributes on the skeleton compile_unit and expect debuggers that didn't >>> understand them, to ignore them. >>> >>> > >>> >>> > Was there some reason this didn't work/make sense? Because you >>> need a DW_TAG_module to import with DW_TAG_imported_module? >>> >>> Using DW_TAG_module was the best practice that was recommended on >>> dwarf-discuss. >>> >>> >>> >>> Did they have any ideas on how to reference it without duplicating >>> it in every CU? >>> >> >>> >> We didn’t touch the deduplication issue. >>> >> >>> >>> Once we've got the "Bag O Dwarf" stuff (rather than the narrower >>> type units) this would be easier - (I suppose we could do a partial >>> solution/abuse of type units - use a type unit header (perhaps with Eric's >>> merged type/compile unit work) and a DW_FORM_ref_sig8 value for the >>> DW_AT_module in the DW_TAG_imported_module. >>> >>> >>> >>> Though I suppose if we're going to have DW_TAG_imported_module in >>> every CU that references a module, it might not be that big of a deal to >>> include the DW_TAG_module itself there too... while I don't care about this >>> scheme immediately, Google's growing LLDB investment in various platforms, >>> so I am vaguely concerned about getting this right & it's not immediately >>> obvious to me what that right answer is. >>> >> >>> >> Maybe the best path forward is to stage this by initially putting the >>> DW_TAG_module into the main CU and leave the deduplication as an >>> optimization to be implemented once the bag’o dwarf is more fleshed out. >>> This way we won’t do anything that would confuse consumers (assuming they >>> ignore unknown tags) and the extra overhead is likely not even going to be >>> noticeable, since all the string attributes inside the TAG_module can >>> already be deduplicated by traditional means. >>> > >>> > Perhaps. I'd still like to think through/document what this looks like >>> a bit more. Where the data ends up, what it's used for, etc. Sorry to draw >>> this out. >>> > >>> > :/ *ponders* >>> >>> >>> Let’s construct this: >>> >>> The most straightforward representation is to not unique the TAG_module >>> and place it into the main CU. >>> >>> bar.o >>> ~~~~~ >>> >>> .debug_info: >>> DW_TAG_compile_unit >>> ... >>> DW_TAG_imported_module >>> DW_AT_import [DW_FORM_ref4] (0x20) >>> 0x20: >>> DW_TAG_module >>> DW_AT_name(“FooLib”) >>> DW_AT_LLVM_sysroot(“/“) >>> DW_AT_LLVM_include_dirs(“-I/path”) >>> DW_AT_LLVM_macros(“-DNDEBUG”) >>> >> >> Might as well put all these LLVM attributes on the skeleton CU, though - >> so they can be deduplicated (& just put the dwo_id in this module >> somewhere, perhaps just using the DW_AT_dwo_id attribute - possibly that's >> the only attribute the DW_TAG_module would need, ideally). Unless we need >> to consider the submodule issue (in which case the skeleton unit would >> reference the whole module but the submodules would reference/describe the >> respective submodules?)? >> >> >> We cannot put them into the skeleton CU if the skeleton CU is going to be >> comdat’d, because we’d then have to refer to it via a signature and that >> leads us directly to the can of worms discussed in the next paragraph :-) >> >> >> >>> ... >>> >>> // Split DWARF skeleton, comdat'd. >>> .debug_info, group 0xFEDB9876, comdat >>> DW_TAG_compile_unit >>> >>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”) >>> DW_AT_dwo_id(“0xFEDB9876”) >>> ... >>> >>> On Mach-O the split DWARF skeleton would not be a comdat’d, but >>> llvm-dsymutil can just ignore it. >>> >>> >>> If we want to dedup the TAG_module we need to refer to it via signature. >>> This means we need to wrap it in a type_unit or a DWARF5 TAG_type_unit. We >>> might as well throw it in with the skeleton CU. >>> >>> .debug_info: >>> DW_TAG_compile_unit >>> ... >>> DW_TAG_imported_module >>> DW_AT_import [DW_FORM_ref_sig8] (0xABCD1234) >>> >>> // Split DWARF skeleton, comdat'd. >>> .debug_info, group 0xFEDB9876, comdat >>> DW_TAG_compile_unit >>> >>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”) >>> DW_AT_dwo_id(“0xFEDB9876”) >>> ... >>> DW_TAG_type_unit (signature: 0xABCD1234) >>> >> >> Can't really put a type_unit inside a compile_unit - it'd need to be >> top-level with an appropriate type unit header, etc. & then we'd need two >> different units/headers, could still comdat them, but it's a weird abuse of >> type units & would probably confuse consumers. I don't know whether that's >> worth the effort. >> >> Oh right. >> >> >> >>> DW_TAG_module >>> DW_AT_name(“FooLib”) >>> DW_AT_LLVM_sysroot(“/“) >>> DW_AT_LLVM_include_dirs(“-I/path”) >>> DW_AT_LLVM_macros(“-DNDEBUG”) >>> ... >>> >>> Now that raises the question about what happens with multiple modules >>> within one PCM. >> >> >> Is the right term "submodule"? it's sort of confusing to talk about >> multiple modules within a pcm. >> >> >> Yes, a module with nested submodules. >> http://clang.llvm.org/docs/Modules.html#submodule-declaration >> >> >> >>> Assuming that the ELF linker is linking and deduping all the non-.dwo >>> sections, we may loose some of the TAG_modules (if not every CU imports all >>> submodules) in the binary, but that wouldn’t matter because the consumer >>> would find all TAG_modules by signature in the .pcm >> >> >> Is there any reason we need to reference the submodules individually, >> rather than just reference the whole module >> >> >> My assumption is that an AST-aware debugger will want to import the exact >> submodules that were imported by the CU before dropping into the expression >> evaluator to replicate the environment of the CU as much as possible. >> > > I'm just not picturing that. It seems pretty likely that a debugger user > is more likely to treat the whole set of names in the program, not just > those syntactically valid at that point in the source file. > > > Module imports only work if the debugger has the precise list of models > imported by the current CU. Clang modules are not namespaces, and any two > modules may conflict. > Right, as you say - ODR & C languages. (& I've no idea if file-scoped static/anonymous namespace things can go in C++ modules and what happens if you have conflicting modules in that regard - I guess they can conflict too? Dunno - maybe anon namespaces in C++ modules aren't allowed) > The cool thing is that with the imported modules the debugger effectively > becomes clang and have the entire world visible to the current CU > available, including any types and functions that never made it into the > debug info because they were optimized out, or because there were > uninstantiated templates that cannot be represented by DWARF. > > A simple example would be if I'm debugging LLVM and I'm in some generic > optimization pass, but I want to cast my Instruction pointer to some > specific instruction type to examine it in more detail - even though this > pass doesn't care about that specific Instruction type nor include the > header in which it's declared. > > > If, however, the type lookup fails, the debugger can still fall back to > the traditional behavior, find the type in the accelerator tables and > reconstruct it from DWARF (if it is there). > So you're going to need to implement fission (to at least some degree) support in LLDB, then? (to support the case where you haven't linked debug info with llvm-dsymutil, but you've hit one of these lookup problems where you need to cross possibly-conflicting modules) OK, so I think it's probably reasonable for now to just add DW_TAG_modules to the CU for each referenced module (or does it have to be each referenced submodule? (can two submodules within a single module be contradictory/conflicting?)). Since we don't have any good way to reference the module is a foreign unit while deduplicating that unit... there's not much point having the imported_module - but if you think it adds anything, I'm open to ideas. Maybe later (when we have Bag O' DWARF) we can do that. & only do this when targeting lldb (on by default on Darwin, off by default elsewhere). & LLDB, once it's got the Fission support it'll need for this anyway, will fallback gracefully if these special modules are omitted. - David > > >> (& have just a single, whole module in the pcm)? >> >> >> That’s probably not what you meant, but just to be sure: The pcm will >> always have the entire module with all submodules in it. But the debugger >> may choose to import only a subset of those. >> >> >> >>> file referred to by whichever skeleton CU makes it into the binary: >>> >>> FooLib-XYZ.pcm >>> ~~~~~~~~~~~~~~ >>> >>> .debug_info.dwo >>> DW_TAG_compile_unit >>> DW_AT_dwo_id(“0xFEDB9876”) >>> ... >>> >>> DW_TAG_type_unit (signature: 0xABCD1234) >>> DW_TAG_module >>> DW_AT_name(“FooLib”) >>> ... >>> DW_TAG_type_unit (signature: 0xCDEF3456) >>> DW_TAG_module >>> DW_AT_name(“FooLib”) >>> DW_TAG_module >>> DW_AT_name(“SubFoo”) >>> ... >>> >>> So.. this should work as long as nobody points out that a module isn’t >>> really a type. >>> >> >> Yeah, probably worth waiting for "Bag O DWARF". >> >> For now, as you mentioned earlier, maybe just putting the imported_module >> and the module into the compile_unit when tuning for LLDB (so Darwin by >> default, and anywhere else where someone tunes for LLDB in the future) & >> leave them out otherwise. >> >> >> Sounds prefectly reasonable. >> >> >> Could you remind me why LLDB wants to know which modules are referenced >> from a CU? (rather than just all the modules used by a program overall?) >> >> >> LLDB uses clang for the expression evaluation. Traditionally it would >> look up a type in DWARF, build a clang AST out of it and then import it. >> With this it could directly import the clang modules and have access to >> everything in the module. But, clang modules are not namespaces, so modules >> can conflict (and that would probably manifest as a crash in libclang). >> > > What's an example of such a conflict? Is that valid (or is it just in ODR > violations) - as mentioned above, it seems to me that only importing the > things lexically available in this source file isn't what a debugger user > would really want. I certainly think I'd trip over that a lot. > > > Keep in mind that Objective-C (and C) do not have an ODR, so it’s not just > “just” :-) > Being able to import modules does not mean that the debugger cannot still > fall back to loading types from DWARF; in fact it will have to do that for > all local types anyway. > > -- adrian > > > >> It therefore needs to know which modules are imported in the current CU >> before dropping into the expression evaluator. >> >> - adrian >> >> >> >>> >>> >>> >>> On Macho-O, in the absence of comdats, we have: >>> >>> bar.o >>> ~~~~~ >>> >>> .debug_info: >>> DW_TAG_compile_unit >>> ... >>> DW_TAG_imported_module >>> DW_AT_import [DW_FORM_ref4] (0x20) >>> >>> DW_TAG_module // uniqued by dsymutil. >>> DW_AT_name(“FooLib”) >>> DW_AT_LLVM_sysroot(“/“) >>> DW_AT_LLVM_include_dirs(“-I/path”) >>> DW_AT_LLVM_macros(“-DNDEBUG”) >>> ... >>> >>> // Split DWARF skeleton, thrown out by dsymutil. >>> >> >> Thrown out? Because it's going to read everything in from the module and >> merge it in to a single linked debug info blob, I take it? >> >> >>> .debug_info, group 0xFEDB9876, comdat >>> DW_TAG_compile_unit >>> >>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”) >>> DW_AT_dwo_id(“0xFEDB9876”) >>> ... >>> >>> FooLib-XYZ.pcm >>> ~~~~~~~~~~~~~~ >>> >>> .debug_info: >>> DW_TAG_compile_unit >>> DW_AT_dwo_id(“0xFEDB9876”) >>> ... >>> >>> DW_TAG_module >>> DW_AT_name(“FooLib”) >>> DW_TAG_module >>> DW_AT_name(“SubFoo”) >>> ... >>> >>> -- adrian >>> >>> > >>> >> >>> >>> >>> >>> > If it turns out that's the right way to get a target for the >>> imported_module, we could put both the skeleton CU and the partial unit in >>> the same comdat and dedup them both together. >>> >>> >>> >>> I think this works as long as we only have one TAG_module per .pcm >>> file (because we need to refer to it via signature). >>> >>> >>> >>> Not quite following here - why would we have more than one module >>> per pcm - a pcm is a module, right? >>> >> >>> >> Clang modules may have submodules and a compile unit could import two >>> submodules that live in the same .pcm file. For example on Darwin there is >>> a module Darwin.pcm that contains a submodule “C" that contains the >>> submodule “stdio". >>> > >>> > OK, so this bit's relevant to your use case in LLDB of loading the >>> right things for the right context, but not relevant to the context-less >>> debuggers like GDB that will just treat everything as one big namespace >>> (except for file-local things, etc). So it's important for your imported >>> modules but not for the basic Fission style debug reference. >>> > >>> > Well, maybe - I'm not sure what you're picturing in terms of the DWARF >>> in the module for submodules? If you want that granularity we'll have to >>> talk about how to split the DWARF in the module into chunks per submodule? >>> > >>> >> >>> >>> >>> >>> But if we don’t mind having duplicate dwo_* references in the same >>> .o file this would also work with more than one TAG_module (or submodules). >>> >>> >>> >>> >>> >>> .debug_info: >>> >>> DW_TAG_compile_unit >>> >>> DW_AT_name(“bar.c”) >>> >>> ... >>> >>> >>> >>> DW_TAG_imported_module // <- This could be optional on ELF. >>> >>> DW_AT_import [DW_FORM_ref_sig8] (0xFEDB9876) >>> >>> >>> >>> ... >>> >>> >>> >>> // Comdat’d split DWARF skeleton CU for the module Foo. >>> >>> .debug_info, group 0xFEDB9876, comdat >>> >>> DW_TAG_compile_unit >>> >>> >>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”) >>> >>> DW_AT_dwo_id(“0xFEDB9876”) >>> >>> ... >>> >>> >>> >>> DW_TAG_module >>> >>> DW_AT_name(“FooLib”) >>> >>> DW_AT_LLVM_sysroot(“/“) >>> >>> DW_AT_LLVM_include_dirs(“-I/path”) >>> >>> DW_AT_LLVM_macros(“-DNDEBUG”) >>> >>> ... >>> >>> >>> >>> >>> >>> > >>> >>> > But this gets into complicated territory when the original binary >>> is built with fission... which will be relevant for modules on ELF with >>> LLDB. Hmm, maybe it's not too complicated - the partial_unit would end up >>> in the .dwo file (maybe we'd have to teach the .dwo file to deduplicate >>> these too - the same way it does for type units... - might require a new >>> header to include the hash, etc :/)... would be tricky to have the dwp tool >>> resolve the relocations to these things. Cross-unit references as you've >>> got there aren't something that every DWARF consumer is totally cool with, >>> I don't think? >>> >>> >>> >>> Ah. I thought the deduplication happens because all ELF sections >>> sharing the same group are uniqued based on the group id. >>> >>> >>> >>> COMDAT groups deduplicate for a normal non-fission build, but >>> fission linking doesn't require the .dwo file to use/contain COMDATs as it >>> uses a DWARF-aware tool (so you don't bother putting the type units in >>> COMDAT groups, for example - the fission linker knows how to parse >>> debug_types, find the type unit headers and their hashes and deduplicates >>> them that way). >>> >> >>> >> Ok that makes sense. >>> >> >>> >> -- adrian >>> >> >>> >>> >>> >>> It certainly would be nice if we could avoid introducing a new >>> .debug_info header... >>> >>> >>> >>> > >>> >>> > Sort of inclined to have the imported module stuff just for LLDB, >>> but I've lost some of the context for that in the ensuing weeks. >>> >>> >>> >>> -- adrian >>> >>> >>> >>> > >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> MachO (no typeunits, no comdats, with imports) >>> >>> >> ---------------------------------------------- >>> >>> >> >>> >>> >> Since we don’t have comdat sections in Mach-O and we don’t have >>> the tool support for type units, the way that external types can be >>> referenced necessarily needs to be a bit different. The design that Greg >>> and I came up with for Mach-O relies on llvm-dsymutil to fix up the DWARF >>> for non-module-aware consumers. Just as ELF DWARF consumers need not be >>> able to tell the difference between module debugging an split DWARF, on >>> Mach-O the .dSYM bundle generated by llvm-dsymutil looks like traditional >>> DWARF. >>> >>> >> >>> >>> >> There are three differences in the DWARF output that make this >>> possible: >>> >>> >> - Refer to external types by UID rather than by type signature. >>> >>> >> (This doubles as the key that allows a debugger to look >>> import the type >>> >>> >> directly from the AST and protects us against hash >>> collisions) >>> >>> >> - Add an index to the .o file that maps UID -> module file. >>> >>> >> (Fast lookup + UIDs for C and ObjC are only unique within a >>> module) >>> >>> >> - Add an entry for each type’s UID to the types accelerator >>> table. >>> >>> >> (Fast lookup) >>> >>> >> >>> >>> >> bar.o >>> >>> >> ~~~~~ >>> >>> >> >>> >>> >> .debug_info: >>> >>> >> DW_TAG_compile_unit >>> >>> >> DW_AT_name(“bar.c”) >>> >>> >> DW_TAG_imported_module >>> >>> >> DW_AT_import(DW_FORM_ref_addr 0x40) >>> >>> >> >>> >>> >> DW_TAG_variable >>> >>> >> DW_AT_name(“MyFoo”) >>> >>> >> DW_AT_type [DW_FORM_strp] (“_ZTS3Foo”) // We could use a >>> custom FORM here >>> >>> >> >>> >>> >> // Skeleton unit. >>> >>> >> DW_TAG_compile_unit >>> >>> >> >>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”) >>> >>> >> DW_AT_dwo_id(“0xFEDB9876”) >>> >>> >> ... >>> >>> >> 0x40: >>> >>> >> DW_TAG_module >>> >>> >> DW_AT_name(“FooLib”) >>> >>> >> DW_AT_LLVM_sysroot(“/“) >>> >>> >> DW_AT_LLVM_include_dirs(“-I/path”) >>> >>> >> DW_AT_LLVM_macros(“-DNDEBUG”) >>> >>> >> >>> >>> >> // This index uses the usual accelerator table format. >>> >>> >> .apple_exttypes: >>> >>> >> { “_ZTS3Foo” => debug_str offset of >>> ”/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm” } >>> >>> >> >>> >>> >> FooLib-XYZ.pcm >>> >>> >> ~~~~~~~~~~~~~~ >>> >>> >> >>> >>> >> .debug_info >>> >>> >> DW_TAG_compile_unit >>> >>> >> DW_AT_dwo_id(“0xFEDB9876”) >>> >>> >> >>> >>> >> 0x80: >>> >>> >> DW_TAG_structure_type >>> >>> >> DW_AT_name (“Foo”) >>> >>> >> DW_AT_signature >>> >>> >> ... >>> >>> >> >>> >>> >> // In addition to the entry for “Foo”, there is also an entry for >>> the type’s UID “_ZTS3Foo” pointing to the type definition DIE. >>> >>> >> .apple_types >>> >>> >> { “Foo” => 0x80 } >>> >>> >> { “_ZTS3Foo” => 0x80 } >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> When the debug info linker (llvm-dsymutil) is run, it first pulls >>> in the .debug_info section from the clang module and fixes up all the >>> DW_FORM_strp external type references by turning them into a >>> DW_FORM_ref_addr that references the type in the DW_TAG_compile_unit pulled >>> in from the module. To find the correct type DIE it looks up the UID in the >>> .apple_exttypes index, finds the module, looks up the UID in the regular >>> .apple_types accelerator table and replaces the temporary DW_FROM_strp with >>> a DW_FORM_ref_addr (which incidentally takes up the same amount of space in >>> the DIE). >>> >>> >> >>> >>> >> >>> >>> >> Thoughts? >>> >>> >> -- >>> >>> >> adrian >>> >>> >> >>> >>> > >>> >>> >>> >> >>> >> >>> > >>> >> >
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
