On Mon, May 4, 2015 at 4:46 PM, Adrian Prantl <[email protected]> wrote:
> > On May 4, 2015, at 1:31 PM, David Blaikie <[email protected]> wrote: > > ... > > >>> So you're going to need to implement fission (to at least some degree) >> support in LLDB, then? (to support the case where you haven't linked debug >> info with llvm-dsymutil, but you've hit one of these lookup problems where >> you need to cross possibly-conflicting modules) >> >> >> >> Yes. Specifically, it won’t support type units, and it will look up >> types by name rather than by signature. (cf. the second part of >> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html >> ) >> > >> > How are you going to reference the types in the module's fission CU >> without type units/signatures? Are you going to emit type declarations into >> the normal CU and rely on the debugger to know that these declarations can >> be resolved by looking elsewhere? (just without the benefit of constraining >> that search to just looking for a matching TU?) >> >> If you look at the example in >> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html, >> there will be an external type index (using the usual accelerator table >> format) that maps an external type’s UID to a pcm. In the pcm there is an >> extra accelerator table entry that maps UID to DIE offset. >> > > I mean I guess that's up to you, but seems like a relatively large > workaround compared to supporting type units... (I mean certainly seems > like strictly less work to do the workaround than implementing type units > in LLDB, but a relatively large amount of work to do/throw away eventually > once LLDB supports type units) > > > It’s not primarily meant to be a workaround so LLDB doesn’t need to > implement type units; the UIDs double as the key (decl context + name) to > import types directly from the AST. The other advantage is that we won’t > have to worry about MD5 hash collisions, but that’s more of a theoretical > advantage. > If there's a concern about collisions, we should fix that regardless (because we'll have the same problem with normal type units or modularized type units). If we've already got (& have to use in the case of DWARF fallback for modules) to support a hash or other numeric identifier, it might be good to use that for everything rather than having two mechanisms? > > > >> >> >> >> >> >> >>> >> >>> OK, so I think it's probably reasonable for now to just add >> DW_TAG_modules to the CU for each referenced module (or does it have to be >> each referenced submodule? (can two submodules within a single module be >> contradictory/conflicting?)). Since we don't have any good way to reference >> the module is a foreign unit while deduplicating that unit... there's not >> much point having the imported_module - but if you think it adds anything, >> I'm open to ideas. >> >> It could help keeping things simpler. >> >> Emitting it doesn’t add much semantic value because module imports >> always occur at the top level, but it will make the transition to the >> deduplicated TAG_modules easier — It could be easier to teach consumers >> once about imported_module({ref to TAG_module}) rather than having them >> also recognize top-level TAG_modules as an intermediate step. It’s also >> slightly easier to implement in LLVM because the imported_module allows us >> to anchor the TAG_module in the CU, but that’s not a very strong argument. >> > >> > Agreed on all counts (not a strong argument, but convenient enough, >> etc, etc). >> > >> > I'm still not entirely sure what the right answer is here, though, >> which is why I'm hesitant to bake anything in too strongly. >> > >> > To come back to one of the outstanding questions: Do you need submodule >> import information, or just module level (if modules cannot have internal >> conflicts and you can't avoid cross-module conflicts just by lack of >> visibility (I have no idea if either of those things are true) then you may >> just need per-module not per-submodule info)? >> >> At the moment I do not think that it makes sense for two submodules to >> conflict, but there is nothing in the clang documentation that explicitly >> forbids this. With this in mind, I think it is reasonable to not support >> submodules (at least initially) and always emit an import for the parent >> module. >> Thats what I wanted to write ... but I as I’m browsing through our >> documentation, >> http://clang.llvm.org/docs/Modules.html#conflict-declarations explicitly >> gives an example of two conflicting submodules, so maybe this is not a >> reasonable simplification after all. On the other hand, a quick grep over >> all system module maps on OS X doesn’t show a single conflict declaration. >> >> I still believe we do not need to support submodules right from the >> start, but we should have a story for getting there if we need to. >> > > Given the simple example that demonstrates the possibility, it seems fair > to have a story for what that looks like, yes - even if a first > pass/prototype doesn't support it. > > > Sure. > > > >> >> > >> > Also, does each submodule need different special attributes/flags? If >> the special codegen attributes you want are at the module level, it'd >> probably be best to keep those on the Skeleton CU for the module (that will >> be comdat folded, etc, on ELF - and they could be DWARF-aware deduplicated >> by llvm-dsymutil) so they're not duplicated. The DW_TAG_module would then >> just have a DW_AT_signature attribute or something similarly small/trivial >> to point to the skeleton CU. >> >> The attributes are derived from cc1 command line arguments. Not two >> submodules imported by one CU can have different attributes. All submodules >> in a pcm also share their attributes. Putting them into the skeleton CU >> appears to be the most efficient place to put them, though perhaps not the >> most logical one. >> > > Why not the most logical? It'd be nice if it were a DW_TAG_module instead > of a DW_TAG_compile_unit - but given the limited vocabulary we have in > DWARF top level tags, it seems as good as we can have. > > > I tend to view the module configuration (include path, isysroot, > configuration macros) to be a part of the module and not a part of the > skeleton that points to the split debug info for that module. > Sure, it's a workaround for the lack of Bag O' DWARF for now, one way or another. Either way the debugger's going to have to jump the > A module is uniquely identified by name + configuration. That’s why I feel > it should be part of the tag that also holds the name. > > > >> I would prefer to stick the attributes on the (top-level) DW_TAG_module >> and later deduplicate the attributes together with the DW_TAG_module. >> Sticking them on the skeleton won’t save any space in the .o files and >> would save 3*4-8=4 bytes (3x FORM_strp for include, macro, and isysroot - >> 1x FORM_ref_sig_8) per CU and imported module. > > > Seems nicer not to duplicate them, especially since not everyone will be > using a debug-aware linker like llvm-dsymutil (LLDB on Windows or Linux > won't have that convenience). Eventually we can use Bag O' DWARF for the > skeleton CU, make it a DW_TAG_module (with more DWARF changes to allow that > as a top-level tag, if desired/useful - I'm not sure it adds a lot) and > have the imported_module reference it that way. (DW_TAG_imported_module, > DW_AT_import, DW_FORM_ref_sig8) > > > I'm not /hugely/ invested in this, but we do have people caring about LLDB > on Linux and Windows, so avoiding tying the LLDB story to MachO and > dsymutil, etc, seems valuable. > > > I think that this would be an unnecessary intermediate step that we > eventually want to migrate away from anyway. We already identified that the > good solution for deduplication is going to be a skeleton TAG_module, so my > view is that it is not worth the trouble adding a temporary indirection > (and a new attribute name) > New attribute name? > to save 4 bytes in the intermediate step. > The debugger's going to need to resolve the skeleton anyway (in the case of non-AST based debugging) so I'm not sure how much it's an extra step. > I don’t actually think there is anything about the TAG_module design tying > this to either MachO or dsymutil, but let me know if you feel otherwise. > Sorry, what I was getting at was that with the Mach0/dsymutil/lldb story you probably don't need to consult the skeleton debug info (actually I forgot, you do - in the case where you need a name from a module that might be incompatible (it's not referenced directly in this CU)) - pre-dsymutil, you'd use the ASTs directly, post-dsymutil I expect you'll have inlined all the debug info so there are no skeletons, etc, if I'm understanding your design correctly. - David > > -- adrian > > > >> > > > > If you need submodule import lists, then each DW_AT_module representing >> a submodule would have a name (anything else?) and the signature refering >> to its module skeleton CU. >> >> What I’m envisioning is >> >> .debug_info: >> DW_TAG_compile_unit >> ... >> DW_TAG_imported_module >> // import FooSubA >> DW_AT_import [DW_FORM_ref4] (0x60) >> >> DW_TAG_module >> DW_AT_name(“FooLib”) >> DW_AT_LLVM_sysroot(“/“) >> DW_AT_LLVM_include_dirs(“-I/path”) >> DW_AT_LLVM_macros(“-DNDEBUG”) >> 0x60: >> DW_TAG_module >> DW_AT_name(“FooSubA”) >> // need not be emitted if not referenced. >> DW_TAG_module >> DW_AT_name(“FooSubASubA”) >> >> // need not be emitted if not referenced. >> DW_TAG_module >> DW_AT_name(“FooSubB”) >> >> >> >> -- adrian >> > >
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
