dblaikie added inline comments.
================
Comment at: lib/CodeGen/CGDebugInfo.cpp:812-814
+ // CodeView types with C++ mangling need a type identifier.
+ if (CGM.getCodeGenOpts().EmitCodeView)
+ return true;
----------------
rnk wrote:
> dblaikie wrote:
> > bwyma wrote:
> > > dblaikie wrote:
> > > > rnk wrote:
> > > > > rnk wrote:
> > > > > > dblaikie wrote:
> > > > > > > Just came across this code today - it's /probably/ a problem for
> > > > > > > LTO. LLVM IR linking depends on the identifier to determine if
> > > > > > > two structs are the same for linkage purposes - so if an
> > > > > > > identifier is added for a non-linkage (local/not externally
> > > > > > > visible) type, LLVM will consider them to be the same even though
> > > > > > > they're unrelated:
> > > > > > > ```
> > > > > > > namespace { struct t1 { int i; int j; }; }
> > > > > > > t1 v1;
> > > > > > > void *v3 = &v1;
> > > > > > > ```
> > > > > > > ```
> > > > > > > namespace { struct t1 { int i; }; }
> > > > > > > t1 v1;
> > > > > > > void *v2 = &v1;
> > > > > > > ```
> > > > > > > ```
> > > > > > > $ clang++-tot -emit-llvm -S a.cpp b.cpp -g -gcodeview &&
> > > > > > > ~/dev/llvm/build/default/bin/llvm-link -o ab.bc a.ll b.ll &&
> > > > > > > ~/dev/llvm/build/default/bin/llvm-dis ab.bc && cat ab.ll | grep
> > > > > > > "\"t1\""
> > > > > > > !8 = distinct !DICompositeType(tag: DW_TAG_structure_type, name:
> > > > > > > "t1", scope: !9, file: !3, line: 1, size: 64, flags:
> > > > > > > DIFlagTypePassByValue, elements: !10, identifier:
> > > > > > > "_ZTSN12_GLOBAL__N_12t1E")
> > > > > > > $ clang++-tot -emit-llvm -S a.cpp b.cpp -g &&
> > > > > > > ~/dev/llvm/build/default/bin/llvm-link -o ab.bc a.ll b.ll &&
> > > > > > > ~/dev/llvm/build/default/bin/llvm-dis ab.bc && cat ab.ll | grep
> > > > > > > "\"t1\""
> > > > > > > !8 = distinct !DICompositeType(tag: DW_TAG_structure_type, name:
> > > > > > > "t1", scope: !9, file: !3, line: 1, size: 64, flags:
> > > > > > > DIFlagTypePassByValue, elements: !10)
> > > > > > > !21 = distinct !DICompositeType(tag: DW_TAG_structure_type, name:
> > > > > > > "t1", scope: !9, file: !17, line: 1, size: 32, flags:
> > > > > > > DIFlagTypePassByValue, elements: !22)
> > > > > > > ```
> > > > > > >
> > > > > > > So in linking, now both `a.cpp` and `b.cpp` refer to a single
> > > > > > > `t1` type (in this case, it looks like the first one - the 64 bit
> > > > > > > wide one).
> > > > > > >
> > > > > > > If CodeView actually can't represent these two distinct types
> > > > > > > with the same name in the same object file, so be it? But this
> > > > > > > looks like it's likely to cause problems for consumers/users.
> > > > > > If you use the MSVC ABI, we will assign unique identifiers to these
> > > > > > two structs:
> > > > > > ```
> > > > > > !9 = distinct !DICompositeType(tag: DW_TAG_structure_type, name:
> > > > > > "t1", scope: !10, file: !7, line: 1, size: 64, flags:
> > > > > > DIFlagTypePassByValue, elements: !11, identifier:
> > > > > > ".?AUt1@?A0xF964240C@@")
> > > > > > ```
> > > > > >
> > > > > > The `A0xF964240C` is set up here, and it is based on the source
> > > > > > file path (the hash will only be unique when source file paths are
> > > > > > unique across the build):
> > > > > > https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/MicrosoftMangle.cpp#L464
> > > > > > ```
> > > > > > // It's important to make the mangled names unique because, when
> > > > > > CodeView
> > > > > > // debug info is in use, the debugger uses mangled type names to
> > > > > > distinguish
> > > > > > // between otherwise identically named types in anonymous
> > > > > > namespaces.
> > > > > > ```
> > > > > >
> > > > > > Maybe this should use a "is MSVC ABI" check instead, since that is
> > > > > > what controls whether the identifier will be unique, and the
> > > > > > unique-ness is what matters for LTO and PDBs.
> > > > > After thinking about it some more, see the comment I added here:
> > > > > rG59320fc9d78237a5f9fdd6a5030d6554bb6976ce
> > > > >
> > > > > ```
> > > > > // If the type is not externally visible, it should be unique to the
> > > > > current TU,
> > > > > // and should not need an identifier to participate in type
> > > > > deduplication.
> > > > > // However, when emitting CodeView, the format internally uses these
> > > > > // unique type name identifers for references between debug info. For
> > > > > example,
> > > > > // the method of a class in an anonymous namespace uses the identifer
> > > > > to refer
> > > > > // to its parent class. The Microsoft C++ ABI attempts to provide
> > > > > unique names
> > > > > // for such types, so when emitting CodeView, always use identifiers
> > > > > for C++
> > > > > // types. This may create problems when attempting to emit CodeView
> > > > > when the MS
> > > > > // C++ ABI is not in use.
> > > > > ```
> > > > >
> > > > > I think type identifiers are pretty crucial for CodeView
> > > > > functionality. The debugger won't be able to link a method to its
> > > > > class without them. Therefore, I think it's better to leave this as
> > > > > it is. The bad experience of duplicate type identifiers is better
> > > > > than the lack of functionality from not emitting identifiers at all
> > > > > for non-externally visible types.
> > > > Yeah, thanks for explaining/adding the comment - that about covers it.
> > > >
> > > >
> > > > Hmm, do these identifiers just need to only be unique within a single
> > > > object file to provide the name referencing mechanics? Perhaps we could
> > > > generate them later to ensure they're unique - rather than risk
> > > > collapsing them while linking?
> > > >
> > > > (at least here's an obscure case where they seem to collide: Compiling
> > > > the same file with different `-D` values, I guess the hash is only for
> > > > the filename, not for other properties that might impact the output -
> > > > so this ends up with two types with the same mangled name even on MSVC
> > > > ABI:
> > > > ```
> > > > namespace { struct t1 { int i;
> > > > #ifdef SECOND
> > > > int j;
> > > > #endif
> > > > }; }
> > > > t1 v1;
> > > > void*
> > > > #ifdef SECOND
> > > > v2
> > > > #else
> > > > v3
> > > > #endif
> > > > = &v1;
> > > > ```
> > > > ```
> > > > clang++-tot -emit-llvm -S a.cpp -g -gcodeview -target x86_64-windows
> > > > && clang++-tot -emit-llvm -S a.cpp -DSECOND -g -gcodeview -o b.ll
> > > > -target x86_64-windows && ~/dev/llvm/build/default/bin/llvm-link -o
> > > > ab.bc a.ll b.ll && ~/dev/llvm/build/default/bin/llvm-dis ab.bc && cat
> > > > ab.ll | grep "\"t1\""
> > > > ```
> > > > )
> > > As Reid already noted, the unique id is required in CodeView to match
> > > type definitions to forward references. Unfortunately, the unique id must
> > > align across translation units. Compiling the same file multiple times
> > > with preprocessor conditionals will result in two different types with
> > > the same unique id. You'll see the same behavior with Visual Studio 2019.
> > I think what I was getting at is that in the case of mangled names for
> > internal linkage things - presumably there's no ABI compatibility concern?
> > It's not like something compiled with MSVC or another version of Clang has
> > to use the same naming scheme for these local entities, so long as the
> > naming scheme doesn't collide?
> >
> > So at least for LTO if we generated the name later (in the CodeView
> > backend) it could be more unique - an LTO build wouldn't collapse the two
> > types together. It could still have an identifier (generated by the
> > CodeView backend) that was used to reference decls and defs within that
> > file.
> Yes, we could calculate the unique identifier during LTO. However, that seems
> like a very marginal improvement in the final user experience since most
> users don't use LTO for incremental development, which is when type
> information is most useful.
>
> On the other hand, if we want to establish the invariant that all
> DICompositeType identifiers are unique within a compile unit, we could
> probably go ahead and do that without impacting the user experience. We
> already have renaming logic like this for `llvm::GlobalValue`s, and frontends
> are check for existing values of that name if they don't want renaming to
> occur.
> if we want to establish the invariant that all DICompositeType identifiers
> are unique within a compile unit, we could probably go ahead and do that
> without impacting the user experience. We already have renaming logic like
> this for llvm::GlobalValues, and frontends are check for existing values of
> that name if they don't want renaming to occur.
That invariant exists - that's the original purpose of the identifiers - LLVM
IR linking deduplicates on the basis of the identifier. So we can't/don't
rename them (because the point is to deduplicate based on them) - we keep the
first one we see and drop the rest.
Any idea what happens when non-LTO with MSVC and you end up with duplicate
identifiers? Are both versions kept, but references end up ambiguous? Is the
second version seen dropped as being a duplicate/unneeded?
If non-LTO MSVC behavior drops later instances of the identifier as duplicate,
then at least the current LTO behavior is no worse than that - though there'd
be some opportunity to make at least LTO better, but yeah, marginally
beneficial.
Repository:
rC Clang
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D45438/new/
https://reviews.llvm.org/D45438
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits