rnk added inline comments.
================ Comment at: lib/CodeGen/CGDebugInfo.cpp:812-814 + // CodeView types with C++ mangling need a type identifier. + if (CGM.getCodeGenOpts().EmitCodeView) + return true; ---------------- dblaikie wrote: > rnk wrote: > > dblaikie wrote: > > > bwyma wrote: > > > > dblaikie wrote: > > > > > rnk wrote: > > > > > > rnk wrote: > > > > > > > dblaikie wrote: > > > > > > > > Just came across this code today - it's /probably/ a problem > > > > > > > > for LTO. LLVM IR linking depends on the identifier to determine > > > > > > > > if two structs are the same for linkage purposes - so if an > > > > > > > > identifier is added for a non-linkage (local/not externally > > > > > > > > visible) type, LLVM will consider them to be the same even > > > > > > > > though they're unrelated: > > > > > > > > ``` > > > > > > > > namespace { struct t1 { int i; int j; }; } > > > > > > > > t1 v1; > > > > > > > > void *v3 = &v1; > > > > > > > > ``` > > > > > > > > ``` > > > > > > > > namespace { struct t1 { int i; }; } > > > > > > > > t1 v1; > > > > > > > > void *v2 = &v1; > > > > > > > > ``` > > > > > > > > ``` > > > > > > > > $ clang++-tot -emit-llvm -S a.cpp b.cpp -g -gcodeview && > > > > > > > > ~/dev/llvm/build/default/bin/llvm-link -o ab.bc a.ll b.ll && > > > > > > > > ~/dev/llvm/build/default/bin/llvm-dis ab.bc && cat ab.ll | grep > > > > > > > > "\"t1\"" > > > > > > > > !8 = distinct !DICompositeType(tag: DW_TAG_structure_type, > > > > > > > > name: "t1", scope: !9, file: !3, line: 1, size: 64, flags: > > > > > > > > DIFlagTypePassByValue, elements: !10, identifier: > > > > > > > > "_ZTSN12_GLOBAL__N_12t1E") > > > > > > > > $ clang++-tot -emit-llvm -S a.cpp b.cpp -g && > > > > > > > > ~/dev/llvm/build/default/bin/llvm-link -o ab.bc a.ll b.ll && > > > > > > > > ~/dev/llvm/build/default/bin/llvm-dis ab.bc && cat ab.ll | grep > > > > > > > > "\"t1\"" > > > > > > > > !8 = distinct !DICompositeType(tag: DW_TAG_structure_type, > > > > > > > > name: "t1", scope: !9, file: !3, line: 1, size: 64, flags: > > > > > > > > DIFlagTypePassByValue, elements: !10) > > > > > > > > !21 = distinct !DICompositeType(tag: DW_TAG_structure_type, > > > > > > > > name: "t1", scope: !9, file: !17, line: 1, size: 32, flags: > > > > > > > > DIFlagTypePassByValue, elements: !22) > > > > > > > > ``` > > > > > > > > > > > > > > > > So in linking, now both `a.cpp` and `b.cpp` refer to a single > > > > > > > > `t1` type (in this case, it looks like the first one - the 64 > > > > > > > > bit wide one). > > > > > > > > > > > > > > > > If CodeView actually can't represent these two distinct types > > > > > > > > with the same name in the same object file, so be it? But this > > > > > > > > looks like it's likely to cause problems for consumers/users. > > > > > > > If you use the MSVC ABI, we will assign unique identifiers to > > > > > > > these two structs: > > > > > > > ``` > > > > > > > !9 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: > > > > > > > "t1", scope: !10, file: !7, line: 1, size: 64, flags: > > > > > > > DIFlagTypePassByValue, elements: !11, identifier: > > > > > > > ".?AUt1@?A0xF964240C@@") > > > > > > > ``` > > > > > > > > > > > > > > The `A0xF964240C` is set up here, and it is based on the source > > > > > > > file path (the hash will only be unique when source file paths > > > > > > > are unique across the build): > > > > > > > https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/MicrosoftMangle.cpp#L464 > > > > > > > ``` > > > > > > > // It's important to make the mangled names unique because, > > > > > > > when CodeView > > > > > > > // debug info is in use, the debugger uses mangled type names > > > > > > > to distinguish > > > > > > > // between otherwise identically named types in anonymous > > > > > > > namespaces. > > > > > > > ``` > > > > > > > > > > > > > > Maybe this should use a "is MSVC ABI" check instead, since that > > > > > > > is what controls whether the identifier will be unique, and the > > > > > > > unique-ness is what matters for LTO and PDBs. > > > > > > After thinking about it some more, see the comment I added here: > > > > > > rG59320fc9d78237a5f9fdd6a5030d6554bb6976ce > > > > > > > > > > > > ``` > > > > > > // If the type is not externally visible, it should be unique to > > > > > > the current TU, > > > > > > // and should not need an identifier to participate in type > > > > > > deduplication. > > > > > > // However, when emitting CodeView, the format internally uses these > > > > > > // unique type name identifers for references between debug info. > > > > > > For example, > > > > > > // the method of a class in an anonymous namespace uses the > > > > > > identifer to refer > > > > > > // to its parent class. The Microsoft C++ ABI attempts to provide > > > > > > unique names > > > > > > // for such types, so when emitting CodeView, always use > > > > > > identifiers for C++ > > > > > > // types. This may create problems when attempting to emit CodeView > > > > > > when the MS > > > > > > // C++ ABI is not in use. > > > > > > ``` > > > > > > > > > > > > I think type identifiers are pretty crucial for CodeView > > > > > > functionality. The debugger won't be able to link a method to its > > > > > > class without them. Therefore, I think it's better to leave this as > > > > > > it is. The bad experience of duplicate type identifiers is better > > > > > > than the lack of functionality from not emitting identifiers at all > > > > > > for non-externally visible types. > > > > > Yeah, thanks for explaining/adding the comment - that about covers it. > > > > > > > > > > > > > > > Hmm, do these identifiers just need to only be unique within a single > > > > > object file to provide the name referencing mechanics? Perhaps we > > > > > could generate them later to ensure they're unique - rather than risk > > > > > collapsing them while linking? > > > > > > > > > > (at least here's an obscure case where they seem to collide: > > > > > Compiling the same file with different `-D` values, I guess the hash > > > > > is only for the filename, not for other properties that might impact > > > > > the output - so this ends up with two types with the same mangled > > > > > name even on MSVC ABI: > > > > > ``` > > > > > namespace { struct t1 { int i; > > > > > #ifdef SECOND > > > > > int j; > > > > > #endif > > > > > }; } > > > > > t1 v1; > > > > > void* > > > > > #ifdef SECOND > > > > > v2 > > > > > #else > > > > > v3 > > > > > #endif > > > > > = &v1; > > > > > ``` > > > > > ``` > > > > > clang++-tot -emit-llvm -S a.cpp -g -gcodeview -target x86_64-windows > > > > > && clang++-tot -emit-llvm -S a.cpp -DSECOND -g -gcodeview -o b.ll > > > > > -target x86_64-windows && ~/dev/llvm/build/default/bin/llvm-link -o > > > > > ab.bc a.ll b.ll && ~/dev/llvm/build/default/bin/llvm-dis ab.bc && cat > > > > > ab.ll | grep "\"t1\"" > > > > > ``` > > > > > ) > > > > As Reid already noted, the unique id is required in CodeView to match > > > > type definitions to forward references. Unfortunately, the unique id > > > > must align across translation units. Compiling the same file multiple > > > > times with preprocessor conditionals will result in two different types > > > > with the same unique id. You'll see the same behavior with Visual > > > > Studio 2019. > > > I think what I was getting at is that in the case of mangled names for > > > internal linkage things - presumably there's no ABI compatibility > > > concern? It's not like something compiled with MSVC or another version of > > > Clang has to use the same naming scheme for these local entities, so long > > > as the naming scheme doesn't collide? > > > > > > So at least for LTO if we generated the name later (in the CodeView > > > backend) it could be more unique - an LTO build wouldn't collapse the two > > > types together. It could still have an identifier (generated by the > > > CodeView backend) that was used to reference decls and defs within that > > > file. > > Yes, we could calculate the unique identifier during LTO. However, that > > seems like a very marginal improvement in the final user experience since > > most users don't use LTO for incremental development, which is when type > > information is most useful. > > > > On the other hand, if we want to establish the invariant that all > > DICompositeType identifiers are unique within a compile unit, we could > > probably go ahead and do that without impacting the user experience. We > > already have renaming logic like this for `llvm::GlobalValue`s, and > > frontends are check for existing values of that name if they don't want > > renaming to occur. > > if we want to establish the invariant that all DICompositeType identifiers > > are unique within a compile unit, we could probably go ahead and do that > > without impacting the user experience. We already have renaming logic like > > this for llvm::GlobalValues, and frontends are check for existing values of > > that name if they don't want renaming to occur. > > That invariant exists - that's the original purpose of the identifiers - LLVM > IR linking deduplicates on the basis of the identifier. So we can't/don't > rename them (because the point is to deduplicate based on them) - we keep the > first one we see and drop the rest. > > Any idea what happens when non-LTO with MSVC and you end up with duplicate > identifiers? Are both versions kept, but references end up ambiguous? Is the > second version seen dropped as being a duplicate/unneeded? > > If non-LTO MSVC behavior drops later instances of the identifier as > duplicate, then at least the current LTO behavior is no worse than that - > though there'd be some opportunity to make at least LTO better, but yeah, > marginally beneficial. Yes, the linker only retains one complete type definition for each unique type name, so you would get the same behavior with or without LTO. Repository: rC Clang CHANGES SINCE LAST ACTION https://reviews.llvm.org/D45438/new/ https://reviews.llvm.org/D45438 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits