Consider the following C++ program

#include <stdio.h>

class Base {
public:
  virtual const char* method1() = 0;
  void method2() {
    printf("%s\n", method1());
  }
};

class DerivedOne : public Base {
  virtual const char* method1() override {
    return "DerivedOne";
  }
};

template<typename T>
class DerivedTwo : public Base {
public:
  DerivedTwo(T t) : t(t) {}
private:
  virtual const char* method1() override {
    return t;
  }
  T t;
};

template<typename T>
class DerivedThree : public Base {
public:
  DerivedThree(T t) : t(t) {}
private:
  virtual const char* method1() override {
    return t();
  }
  T t;
};

int main() {
  DerivedOne d1;
  DerivedTwo d2("DerivedTwo");
  DerivedThree d3([]() {
    return "DerivedThree";
  });
  d1.method2();
  d2.method2();
  d3.method2();
  return 0;
}

If a debugger stops at method1, the DW_TAG_formal_parameter will tell
the debugger the type of `this` is Base. Downcasting to the derived
type is very useful for the programmer though, so both gdb and lldb
contain a feature to downcast based on the vtable pointer (the "print
object" and the "target.prefer-dynamic" settings in the respective
debuggers).

The first part of this is straightforward. The DWARF for Base will
contain a member for the vtable pointer, and that plus knowledge of
how the ABI lays out vtables allows the debugger to effectively do a
dynamic_cast<void*> to obtain a pointer to the most derived object.
>From there the vtable address is compared against the ELF symbol table
to find the mangled name of the vtable symbol.

Then things begin to get hairy, the debugger demangles the mangled
name that exists in the ELF symbol table, chops off the "vtable for "
prefix on the demangled name, and searches for the type by name in the
DWARF. If it finds the type, it adjusts the type of the value and
prints it accordingly. But this text based matching doesn't always
work. There are no mangled names for types so the debugger's
demangling has to match the compiler's output character for character.

In the example program I've provided, when using the respective
compilers, gdb can successfully downcast DerivedOne and DerivedThree
but not DerivedTwo. gdb fails because gcc emits the DW_TAG_class_type
with a DW_AT_name "DerivedTwo<main()::<lambda()> >" but libiberty
demangles the vtable symbol to "vtable for
DerivedTwo<main::{lambda()#1}>" and those do not match. lldb can only
successfully downcast DerivedOne. lldb appears to not handle classes
with template parameters correctly at all. And even if all of that
were fixed, libiberty and llvm disagree about how to demangle the
symbol for DerivedTwo's vtable, so the two ecosystems would not be
interoperable.

Perhaps these are merely quality of implementation issues and belong
on the respective bug trackers, however, better representations are
possible. Rustc, for example, does not rely on the ELF symbol table
and demangled string matching. It emits a global variable in the DWARF
whose location is the address of the vtable. That variable has a
DW_AT_type pointing to a DW_TAG_class_type that describes the layout
of the vtable, and that type has a DW_AT_containing_type that points
to the type making use of that vtable.

Any thoughts?

- Kyle
-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss

Reply via email to