Looks like https://reviews.llvm.org/D122766 (-ffile-reproducible) might solve my immediate issues in clang, but I think we should still consider moving to a more canonical naming of lambdas that, necessarily, doesn't include the file name (unfortunately). Probably has to include the lambda numbering/something roughly equivalent to the mangled lambda name - it could include type information (it'd be superfluous to a unique identifier, but I don't think it would break consistently naming the same type across CUs either).
Anyone got ideas/preferences/thoughts on this? On Mon, Jan 24, 2022 at 5:51 PM David Blaikie <dblai...@gmail.com> wrote: > On Mon, Jan 24, 2022 at 5:37 PM Adrian Prantl <apra...@apple.com> wrote: > >> >> >> On Jan 23, 2022, at 2:53 PM, David Blaikie <dblai...@gmail.com> wrote: >> >> A rather common "quality of implementation" issue seems to be lambda >> naming. >> >> I came across this due to non-canonicalization of lambda names in >> template parameters depending on how a source file is named in Clang, and >> GCC's seem to be very ambiguous: >> >> $ cat tmp/lambda.h >> template<typename T> >> void f1(T) { } >> static int i = (f1([]{}), 1); >> static int j = (f1([]{}), 2); >> void f1() { >> f1([]{}); >> f1([]{}); >> } >> $ cat tmp/lambda.cpp >> #ifdef I_PATH >> #include <tmp/lambda.h> >> #else >> #include "lambda.h" >> #endif >> $ clang++-tot tmp/lambda.cpp -g -c -I. -DI_PATH && llvm-dwarfdump-tot >> lambda.o | grep "f1<" >> DW_AT_name ("*f1<*(lambda at ./tmp/lambda.h:3:20)>") >> DW_AT_name ("*f1<*(lambda at ./tmp/lambda.h:4:20)>") >> DW_AT_name ("*f1<*(lambda at ./tmp/lambda.h:6:6)>") >> DW_AT_name ("*f1<*(lambda at ./tmp/lambda.h:7:6)>") >> $ clang++-tot tmp/lambda.cpp -g -c && llvm-dwarfdump-tot lambda.o | grep >> "f1<" >> DW_AT_name ("*f1<*(lambda at tmp/lambda.h:3:20)>") >> DW_AT_name ("*f1<*(lambda at tmp/lambda.h:4:20)>") >> DW_AT_name ("*f1<*(lambda at tmp/lambda.h:6:6)>") >> DW_AT_name ("*f1<*(lambda at tmp/lambda.h:7:6)>") >> $ g++-tot tmp/lambda.cpp -g -c -I. && llvm-dwarfdump-tot lambda.o | grep >> "f1<" >> DW_AT_name ("*f1<*f1()::<lambda()> >") >> DW_AT_name ("*f1<*f1()::<lambda()> >") >> DW_AT_name ("*f1<*<lambda()> >") >> >> DW_AT_name ("*f1<*<lambda()> >") >> >> (I came across this in the context of my simplified template names work - >> rebuilding names from the DW_TAG description of the template parameters - >> and while I'm not rebuilding names that have lambda parameters (keep >> encoding the full string instead). The issue is if some other type >> depending on a type with a lambda parameter - but then multiple uses of >> that inner type exist, from different translation units (using type units) >> with different ways of naming the same file - so then the expected name has >> one spelling, but the actual spelling is different due to the "./") >> >> But all this said - it'd be good to figure out a reliable naming - the >> naming we have here, while usable for humans (pointing to surce files, etc) >> - they don't reliably give unique names for each lambda/template >> instantiation which would make it difficult for a consumer to know if two >> entities are the same (important for types - is some function parameter the >> same type as another type?) >> >> While it's expected cross-producer (eg: trying to be compatible with GCC >> and Clang debug info) you have to do some fuzzy matching (eg: "f1<int*>" or >> "f1<int *>" at the most basic - there are more complicated cases) - this >> one's not possible with the data available. >> >> The source file/line/column is insufficient to uniquely identify a lambda >> (multiple lambdas stamped out by a macro would get all the same >> file/line/col) and valid code (albeit unlikely) that writes the same >> definition in multiple places could make the same lambda have different >> names. >> >> We should probably use something more like the way various ABI manglings >> do to identify these entities. >> >> But we should probably also do this for other unnamed types that have >> linkage (need to/would benefit from being matched up between two CUs), even >> not lambdas. >> >> FWIW, at least the llvm-cxxfilt demanglings of clang's manglings for >> these symbols is: >> >> void f1<$_0>($_0) >> f1<$_1>($_1) >> void f1<f1()::$_2>(f1()::$_2) >> void f1<f1()::$_3>(f1()::$_3) >> >> Should we use that instead? >> >> >> The only other information that the current human-readable DWARF name >> carries is the file+line and that is fully redundant with DW_AT_file/line, >> so the above scheme seem reasonable to me. Poorly symbolicated backtraces >> would be worse in this scheme, so I'm expecting most pushback from users >> who rely on a tool that just prints the human readable name with no source >> info. >> > > Yeah - you can always pull the file/line/col from the DW_AT_decl_* anyway, > so encoding it in the type name does seem redundant and inefficient indeed > (beyond/independent of the correctness issues). > >> GCC's mangling's different (in these examples that's OK, since they're >> all internal linkage): >> >> void f1<f1()::'lambda0'()>(f1()::'lambda0'()) >> void f1<f1()::'lambda'()>(f1()::'lambda'()) >> >> If I add an example like this: >> >> inline auto f1() { return []{}; } >> >> and instantiate the template with the result of f1: >> >> void f1<f2()::'lambda'()>(f2()::'lambda'()) >> >> GCC: >> >> void f1<f2()::'lambda'()>(f2()::'lambda'()) >> >> So they consistently use the same mangling - we could use the same naming >> for template parameters? >> >> How should we communicate this sort of identity for unnamed types in the >> DIEs describing the types themselves (not just the string of a template >> name of a type instantiated with the unnamed type) so the unnamed type can >> be matched up between translation units. >> >> eg, if I have these two translation units: >> // header >> inline auto f1() { struct { } local; return local; } >> // unit 1: >> #include "header" >> auto f2(decltype(f1())) { } >> // unit 2: >> #include "header" >> decltype(f1()) v1; >> >> Currently the DWARF produced for this unnamed type is: >> 0x0000003f: DW_TAG_structure_type >> DW_AT_calling_convention (DW_CC_pass_by_value) >> DW_AT_byte_size (0x01) >> DW_AT_decl_file ( >> "/usr/local/google/home/blaikie/dev/scratch/test.cpp") >> DW_AT_decl_line (1) >> >> >> is this the type of struct {}? >> > > Yep. You'll get separate distinct descriptions that are essentially the > same - imagine if `f1` had two such types written as "struct {}" (say they > were used to instantiate two different templates - "struct {} a; struct {} > b; f_templ(a); f_templ(b);" - the DWARF will have two of those unnamed > DW_TAG_structure_types and two template specializations, etc - but no way > to know which of those unnamed types line up with uses in another > translation unit, in terms of overload resolution, etc. > >> So there's no way to know if you see that structure type definition in >> two different translation units whether they refer to the same type because >> there may be multiple types that have the same DWARF description. (so no >> way to know if the DWARF consumer should allow the user to evaluate an >> expression `f2(v1)` or not, I think?) >> >> >> Does a C++ compiler usually treat structurally equivalent but differently >> named types as interchangeable? >> > > No - given "struct A { int i; }; struct B { int i; }; void f1(A); ... " - > "f1(A())" is valid, but "f1(B())" is invalid and an error at compile-time. > https://godbolt.org/z/de7Yce1qW > > >> Does a C++ compiler usually treat structurally equivalent anonymous types >> as interchangeable? >> > > No, same rules apply as named types: https://godbolt.org/z/hxWMYbWc8 > > >> >> -- adrian >> >> >> I guess the only way to have an unnamed type with linkage is to use it >> inside an inline function - so within that scope you'd have to produce >> DWARF for any types consistently in all definitions of the function and >> then a consumer could match them up by counting (assuming the unnamed types >> were always emitted in the same order in the child DIE list)... >> >> But this all seems a bit subtle & maybe would benefit from a more >> robust/explicit description? >> >> Perhaps adding an integer attribute to number anonymous types? They'd >> need to differentiate between lambdas and other anonymous types, since they >> have separate numberings. >> >> >>
_______________________________________________ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org