Re: [Dwarf-Discuss] lambda (& other anonymous type) identification/naming

David Blaikie via Dwarf-Discuss Sun, 24 Jul 2022 21:09:23 -0700

Ping on this thread - would love to hear what ideas folks have for
addressing the naming of anonymous types (enums, structs/classes, and
lambdas) - especially if it'd make it easier to go back/forth between
the DW_AT_name of a template with an unnamed type as a parameter and
the actual DIEs describing the same parameter type.


On Tue, Jun 14, 2022 at 1:02 PM David Blaikie <dblai...@gmail.com> wrote:
>
> Looks like https://reviews.llvm.org/D122766 (-ffile-reproducible) might solve 
> my immediate issues in clang, but I think we should still consider moving to 
> a more canonical naming of lambdas that, necessarily, doesn't include the 
> file name (unfortunately). Probably has to include the lambda 
> numbering/something roughly equivalent to the mangled lambda name - it could 
> include type information (it'd be superfluous to a unique identifier, but I 
> don't think it would break consistently naming the same type across CUs 
> either).
>
> Anyone got ideas/preferences/thoughts on this?
>
> On Mon, Jan 24, 2022 at 5:51 PM David Blaikie <dblai...@gmail.com> wrote:
>>
>> On Mon, Jan 24, 2022 at 5:37 PM Adrian Prantl <apra...@apple.com> wrote:
>>>
>>>
>>>
>>> On Jan 23, 2022, at 2:53 PM, David Blaikie <dblai...@gmail.com> wrote:
>>>
>>> A rather common "quality of implementation" issue seems to be lambda naming.
>>>
>>> I came across this due to non-canonicalization of lambda names in template 
>>> parameters depending on how a source file is named in Clang, and GCC's seem 
>>> to be very ambiguous:
>>>
>>> $ cat tmp/lambda.h
>>> template<typename T>
>>> void f1(T) { }
>>> static int i = (f1([]{}), 1);
>>> static int j = (f1([]{}), 2);
>>> void f1() {
>>>   f1([]{});
>>>   f1([]{});
>>> }
>>> $ cat tmp/lambda.cpp
>>> #ifdef I_PATH
>>> #include <tmp/lambda.h>
>>> #else
>>> #include "lambda.h"
>>> #endif
>>> $ clang++-tot tmp/lambda.cpp -g -c -I. -DI_PATH && llvm-dwarfdump-tot 
>>> lambda.o | grep "f1<"
>>>                 DW_AT_name      ("f1<(lambda at ./tmp/lambda.h:3:20)>")
>>>                 DW_AT_name      ("f1<(lambda at ./tmp/lambda.h:4:20)>")
>>>                 DW_AT_name      ("f1<(lambda at ./tmp/lambda.h:6:6)>")
>>>                 DW_AT_name      ("f1<(lambda at ./tmp/lambda.h:7:6)>")
>>> $ clang++-tot tmp/lambda.cpp -g -c && llvm-dwarfdump-tot lambda.o | grep 
>>> "f1<"
>>>                 DW_AT_name      ("f1<(lambda at tmp/lambda.h:3:20)>")
>>>                 DW_AT_name      ("f1<(lambda at tmp/lambda.h:4:20)>")
>>>                 DW_AT_name      ("f1<(lambda at tmp/lambda.h:6:6)>")
>>>                 DW_AT_name      ("f1<(lambda at tmp/lambda.h:7:6)>")
>>> $ g++-tot tmp/lambda.cpp -g -c -I. && llvm-dwarfdump-tot lambda.o | grep 
>>> "f1<"
>>>                 DW_AT_name      ("f1<f1()::<lambda()> >")
>>>                 DW_AT_name      ("f1<f1()::<lambda()> >")
>>>                 DW_AT_name      ("f1<<lambda()> >")
>>>
>>>                 DW_AT_name      ("f1<<lambda()> >")
>>>
>>> (I came across this in the context of my simplified template names work - 
>>> rebuilding names from the DW_TAG description of the template parameters - 
>>> and while I'm not rebuilding names that have lambda parameters (keep 
>>> encoding the full string instead). The issue is if some other type 
>>> depending on a type with a lambda parameter - but then multiple uses of 
>>> that inner type exist, from different translation units (using type units) 
>>> with different ways of naming the same file - so then the expected name has 
>>> one spelling, but the actual spelling is different due to the "./")
>>>
>>> But all this said - it'd be good to figure out a reliable naming - the 
>>> naming we have here, while usable for humans (pointing to surce files, etc) 
>>> - they don't reliably give unique names for each lambda/template 
>>> instantiation which would make it difficult for a consumer to know if two 
>>> entities are the same (important for types - is some function parameter the 
>>> same type as another type?)
>>>
>>> While it's expected cross-producer (eg: trying to be compatible with GCC 
>>> and Clang debug info) you have to do some fuzzy matching (eg: "f1<int*>" or 
>>> "f1<int *>" at the most basic - there are more complicated cases) - this 
>>> one's not possible with the data available.
>>>
>>> The source file/line/column is insufficient to uniquely identify a lambda 
>>> (multiple lambdas stamped out by a macro would get all the same 
>>> file/line/col) and valid code (albeit unlikely) that writes the same 
>>> definition in multiple places could make the same lambda have different 
>>> names.
>>>
>>> We should probably use something more like the way various ABI manglings do 
>>> to identify these entities.
>>>
>>> But we should probably also do this for other unnamed types that have 
>>> linkage (need to/would benefit from being matched up between two CUs), even 
>>> not lambdas.
>>>
>>> FWIW, at least the llvm-cxxfilt demanglings of clang's manglings for these 
>>> symbols is:
>>>
>>>  void f1<$_0>($_0)
>>>  f1<$_1>($_1)
>>>  void f1<f1()::$_2>(f1()::$_2)
>>>  void f1<f1()::$_3>(f1()::$_3)
>>>
>>> Should we use that instead?
>>>
>>>
>>> The only other information that the current human-readable DWARF name 
>>> carries is the file+line and that is fully redundant with DW_AT_file/line, 
>>> so the above scheme seem reasonable to me. Poorly symbolicated backtraces 
>>> would be worse in this scheme, so I'm expecting most pushback from users 
>>> who rely on a tool that just prints the human readable name with no source 
>>> info.
>>
>>
>> Yeah - you can always pull the file/line/col from the DW_AT_decl_* anyway, 
>> so encoding it in the type name does seem redundant and inefficient indeed 
>> (beyond/independent of the correctness issues).
>>>
>>> GCC's mangling's different (in these examples that's OK, since they're all 
>>> internal linkage):
>>>
>>>  void f1<f1()::'lambda0'()>(f1()::'lambda0'())
>>>  void f1<f1()::'lambda'()>(f1()::'lambda'())
>>>
>>> If I add an example like this:
>>>
>>> inline auto f1() { return []{}; }
>>>
>>> and instantiate the template with the result of f1:
>>>
>>>  void f1<f2()::'lambda'()>(f2()::'lambda'())
>>>
>>> GCC:
>>>
>>>  void f1<f2()::'lambda'()>(f2()::'lambda'())
>>>
>>> So they consistently use the same mangling - we could use the same naming 
>>> for template parameters?
>>>
>>> How should we communicate this sort of identity for unnamed types in the 
>>> DIEs describing the types themselves (not just the string of a template 
>>> name of a type instantiated with the unnamed type) so the unnamed type can 
>>> be matched up between translation units.
>>>
>>> eg, if I have these two translation units:
>>> // header
>>> inline auto f1() { struct { } local; return local; }
>>> // unit 1:
>>> #include "header"
>>> auto f2(decltype(f1())) { }
>>> // unit 2:
>>> #include "header"
>>> decltype(f1()) v1;
>>>
>>> Currently the DWARF produced for this unnamed type is:
>>> 0x0000003f:   DW_TAG_structure_type
>>>                 DW_AT_calling_convention        (DW_CC_pass_by_value)
>>>                 DW_AT_byte_size (0x01)
>>>                 DW_AT_decl_file 
>>> ("/usr/local/google/home/blaikie/dev/scratch/test.cpp")
>>>                 DW_AT_decl_line (1)
>>>
>>>
>>> is this the type of struct {}?
>>
>>
>> Yep. You'll get separate distinct descriptions that are essentially the same 
>> - imagine if `f1` had two such types written as "struct {}" (say they were 
>> used to instantiate two different templates - "struct {} a; struct {} b; 
>> f_templ(a); f_templ(b);" - the DWARF will have two of those unnamed 
>> DW_TAG_structure_types and two template specializations, etc - but no way to 
>> know which of those unnamed types line up with uses in another translation 
>> unit, in terms of overload resolution, etc.
>>>
>>> So there's no way to know if you see that structure type definition in two 
>>> different translation units whether they refer to the same type because 
>>> there may be multiple types that have the same DWARF description. (so no 
>>> way to know if the DWARF consumer should allow the user to evaluate an 
>>> expression `f2(v1)` or not, I think?)
>>>
>>>
>>> Does a C++ compiler usually treat structurally equivalent but differently 
>>> named types as interchangeable?
>>
>>
>> No - given "struct A { int i; }; struct B { int i; }; void f1(A); ... " - 
>> "f1(A())" is valid, but "f1(B())" is invalid and an error at compile-time. 
>> https://godbolt.org/z/de7Yce1qW
>>
>>>
>>> Does a C++ compiler usually treat structurally equivalent anonymous types 
>>> as interchangeable?
>>
>>
>> No, same rules apply as named types: https://godbolt.org/z/hxWMYbWc8
>>
>>>
>>>
>>> -- adrian
>>>
>>>
>>> I guess the only way to have an unnamed type with linkage is to use it 
>>> inside an inline function - so within that scope you'd have to produce 
>>> DWARF for any types consistently in all definitions of the function and 
>>> then a consumer could match them up by counting (assuming the unnamed types 
>>> were always emitted in the same order in the child DIE list)...
>>>
>>> But this all seems a bit subtle & maybe would benefit from a more 
>>> robust/explicit description?
>>>
>>> Perhaps adding an integer attribute to number anonymous types? They'd need 
>>> to differentiate between lambdas and other anonymous types, since they have 
>>> separate numberings.
>>>
>>>
_______________________________________________
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org

Re: [Dwarf-Discuss] lambda (& other anonymous type) identification/naming

Reply via email to