Re: [Dwarf-Discuss] lambda (& other anonymous type) identification/naming

David Blaikie via Dwarf-Discuss Tue, 25 Jan 2022 08:14:11 -0800

On Mon, Jan 24, 2022 at 5:37 PM Adrian Prantl <apra...@apple.com> wrote:


>
>
> On Jan 23, 2022, at 2:53 PM, David Blaikie <dblai...@gmail.com> wrote:
>
> A rather common "quality of implementation" issue seems to be lambda
> naming.
>
> I came across this due to non-canonicalization of lambda names in template
> parameters depending on how a source file is named in Clang, and GCC's seem
> to be very ambiguous:
>
> $ cat tmp/lambda.h
> template<typename T>
> void f1(T) { }
> static int i = (f1([]{}), 1);
> static int j = (f1([]{}), 2);
> void f1() {
>   f1([]{});
>   f1([]{});
> }
> $ cat tmp/lambda.cpp
> #ifdef I_PATH
> #include <tmp/lambda.h>
> #else
> #include "lambda.h"
> #endif
> $ clang++-tot tmp/lambda.cpp -g -c -I. -DI_PATH && llvm-dwarfdump-tot
> lambda.o | grep "f1<"
>                 DW_AT_name      ("*f1<*(lambda at ./tmp/lambda.h:3:20)>")
>                 DW_AT_name      ("*f1<*(lambda at ./tmp/lambda.h:4:20)>")
>                 DW_AT_name      ("*f1<*(lambda at ./tmp/lambda.h:6:6)>")
>                 DW_AT_name      ("*f1<*(lambda at ./tmp/lambda.h:7:6)>")
> $ clang++-tot tmp/lambda.cpp -g -c && llvm-dwarfdump-tot lambda.o | grep
> "f1<"
>                 DW_AT_name      ("*f1<*(lambda at tmp/lambda.h:3:20)>")
>                 DW_AT_name      ("*f1<*(lambda at tmp/lambda.h:4:20)>")
>                 DW_AT_name      ("*f1<*(lambda at tmp/lambda.h:6:6)>")
>                 DW_AT_name      ("*f1<*(lambda at tmp/lambda.h:7:6)>")
> $ g++-tot tmp/lambda.cpp -g -c -I. && llvm-dwarfdump-tot lambda.o | grep
> "f1<"
>                 DW_AT_name      ("*f1<*f1()::<lambda()> >")
>                 DW_AT_name      ("*f1<*f1()::<lambda()> >")
>                 DW_AT_name      ("*f1<*<lambda()> >")
>
>                 DW_AT_name      ("*f1<*<lambda()> >")
>
> (I came across this in the context of my simplified template names work -
> rebuilding names from the DW_TAG description of the template parameters -
> and while I'm not rebuilding names that have lambda parameters (keep
> encoding the full string instead). The issue is if some other type
> depending on a type with a lambda parameter - but then multiple uses of
> that inner type exist, from different translation units (using type units)
> with different ways of naming the same file - so then the expected name has
> one spelling, but the actual spelling is different due to the "./")
>
> But all this said - it'd be good to figure out a reliable naming - the
> naming we have here, while usable for humans (pointing to surce files, etc)
> - they don't reliably give unique names for each lambda/template
> instantiation which would make it difficult for a consumer to know if two
> entities are the same (important for types - is some function parameter the
> same type as another type?)
>
> While it's expected cross-producer (eg: trying to be compatible with GCC
> and Clang debug info) you have to do some fuzzy matching (eg: "f1<int*>" or
> "f1<int *>" at the most basic - there are more complicated cases) - this
> one's not possible with the data available.
>
> The source file/line/column is insufficient to uniquely identify a lambda
> (multiple lambdas stamped out by a macro would get all the same
> file/line/col) and valid code (albeit unlikely) that writes the same
> definition in multiple places could make the same lambda have different
> names.
>
> We should probably use something more like the way various ABI manglings
> do to identify these entities.
>
> But we should probably also do this for other unnamed types that have
> linkage (need to/would benefit from being matched up between two CUs), even
> not lambdas.
>
> FWIW, at least the llvm-cxxfilt demanglings of clang's manglings for these
> symbols is:
>
>  void f1<$_0>($_0)
>  f1<$_1>($_1)
>  void f1<f1()::$_2>(f1()::$_2)
>  void f1<f1()::$_3>(f1()::$_3)
>
> Should we use that instead?
>
>
> The only other information that the current human-readable DWARF name
> carries is the file+line and that is fully redundant with DW_AT_file/line,
> so the above scheme seem reasonable to me. Poorly symbolicated backtraces
> would be worse in this scheme, so I'm expecting most pushback from users
> who rely on a tool that just prints the human readable name with no source
> info.
>

Yeah - you can always pull the file/line/col from the DW_AT_decl_* anyway,
so encoding it in the type name does seem redundant and inefficient indeed
(beyond/independent of the correctness issues).

> GCC's mangling's different (in these examples that's OK, since they're all
> internal linkage):
>
>  void f1<f1()::'lambda0'()>(f1()::'lambda0'())
>  void f1<f1()::'lambda'()>(f1()::'lambda'())
>
> If I add an example like this:
>
> inline auto f1() { return []{}; }
>
> and instantiate the template with the result of f1:
>
>  void f1<f2()::'lambda'()>(f2()::'lambda'())
>
> GCC:
>
>  void f1<f2()::'lambda'()>(f2()::'lambda'())
>
> So they consistently use the same mangling - we could use the same naming
> for template parameters?
>
> How should we communicate this sort of identity for unnamed types in the
> DIEs describing the types themselves (not just the string of a template
> name of a type instantiated with the unnamed type) so the unnamed type can
> be matched up between translation units.
>
> eg, if I have these two translation units:
> // header
> inline auto f1() { struct { } local; return local; }
> // unit 1:
> #include "header"
> auto f2(decltype(f1())) { }
> // unit 2:
> #include "header"
> decltype(f1()) v1;
>
> Currently the DWARF produced for this unnamed type is:
> 0x0000003f:   DW_TAG_structure_type
>                 DW_AT_calling_convention        (DW_CC_pass_by_value)
>                 DW_AT_byte_size (0x01)
>                 DW_AT_decl_file (
> "/usr/local/google/home/blaikie/dev/scratch/test.cpp")
>                 DW_AT_decl_line (1)
>
>
> is this the type of struct {}?
>

Yep. You'll get separate distinct descriptions that are essentially the
same - imagine if `f1` had two such types written as "struct {}" (say they
were used to instantiate two different templates - "struct {} a; struct {}
b; f_templ(a); f_templ(b);" - the DWARF will have two of those unnamed
DW_TAG_structure_types and two template specializations, etc - but no way
to know which of those unnamed types line up with uses in another
translation unit, in terms of overload resolution, etc.

> So there's no way to know if you see that structure type definition in two
> different translation units whether they refer to the same type because
> there may be multiple types that have the same DWARF description. (so no
> way to know if the DWARF consumer should allow the user to evaluate an
> expression `f2(v1)` or not, I think?)
>
>
> Does a C++ compiler usually treat structurally equivalent but differently
> named types as interchangeable?
>

No - given "struct A { int i; }; struct B { int i; }; void f1(A); ... " -
"f1(A())" is valid, but "f1(B())" is invalid and an error at compile-time.
https://godbolt.org/z/de7Yce1qW


> Does a C++ compiler usually treat structurally equivalent anonymous types
> as interchangeable?
>

No, same rules apply as named types: https://godbolt.org/z/hxWMYbWc8


>
> -- adrian
>
>
> I guess the only way to have an unnamed type with linkage is to use it
> inside an inline function - so within that scope you'd have to produce
> DWARF for any types consistently in all definitions of the function and
> then a consumer could match them up by counting (assuming the unnamed types
> were always emitted in the same order in the child DIE list)...
>
> But this all seems a bit subtle & maybe would benefit from a more
> robust/explicit description?
>
> Perhaps adding an integer attribute to number anonymous types? They'd need
> to differentiate between lambdas and other anonymous types, since they have
> separate numberings.
>
>
>

_______________________________________________
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org

Re: [Dwarf-Discuss] lambda (& other anonymous type) identification/naming

Reply via email to