I've put a WIP patch up here: https://reviews.llvm.org/D44668
Sorry for the delay!
Erik
On 2018-01-26 3:56 PM, Greg Clayton wrote:
On Jan 26, 2018, at 8:38 AM, Erik Pilkington
mailto:erik.pilking...@gmail.com>> wrote:
On 2018-01-25 1:58 PM, Greg Clayton wrote:
On Jan 25, 2018, at 10:25 AM, Erik Pilkington
mailto:erik.pilking...@gmail.com>> wrote:
Hi,
I'm not at all familiar with LLDB, but I've been doing some work on
the demangler in libcxxabi. It's still a work in progress and I
haven't yet copied the changes over to ItaniumDemangle, which AFAIK
is what lldb uses. The demangler in libcxxabi now demangles the
symbol you attached in 3.31 seconds, instead of 223.54 on my
machine. I posted a RFC on my work here
(http://lists.llvm.org/pipermail/llvm-dev/2017-June/114448.html),
but basically the new demangler just produces an AST then traverses
it to print the demangled name.
Great to hear the huge speedup in demangling! LLDB actually has two
demanglers: a fast one that can demangle 99% of names, and we fall
back to ItaniumDemangle which can do all names but is really slow.
It would be fun to compare your new demangler with the fast one and
see if we can get rid of the fast demangler now.
I think a good way of making this even faster is to have LLDB
consume the AST the demangler produces directly. The AST is a
better representation of the information that LLDB wants, and
finishing the demangle and then fishing out that information from
the output string is unfortunate. From the AST, it would be really
straightforward to just individually print all the components of
the name that LLDB wants.
This would help us to grab the important bits out of the mangled
name as well. We chop up a demangled name to find the base name
(string for std::string), containing context (std:: for std::string)
and we check if we can tell if the function is a method (look for
trailing "const" modifier on the function) versus a top level
function (since the mangling doesn't fully specify what is a
namespace and what is a class (like in "foo::bar::baz()" we don't
know if "foo" or "bar" are classes or namespaces. So the AST would
be great as long as it is fast.
Most of the time it takes to demangle these "symbols from hell" is
during the printing, after the AST has been parsed, because the
demangler has to flatten out all the potentially nested back
references. Just parsing to an AST should be about proportional to
the strlen of the mangled name. Since (AFAIK) LLDB doesn't use some
sections of the demangled name often (such as parameters), from the
AST LLDB could lazily decide not to even bother fully demangling
some sections of the name, then if it ever needs them it could
parse a new AST and get them from there. I think this would largely
fix the issue, as most of the time these crazy expansions don't
occur in the name itself, but in the parameters or return type.
Even when they do appear in the name, it would be possible to do
some simple name classification (ie, does this symbol refer to a
function) or pull out the basename quickly without expanding
anything at all.
Any thoughts? I'm really not at all familiar with LLDB, so I could
have this all wrong!
AST sounds great. We can put this into the class we use to chop us
C++ names as that is really our goal.
So it would be great to do a speed comparison between our fast
demangler in LLDB (in FastDemangle.cpp/.h) and your updated
libcxxabi version. If yours is faster, remove FastDemangle and then
update the llvm::ItaniumDemangle() to use your new code.
ASTs would be great for the C++ name parser,
Let us know what you are thinking,
Hi Greg,
I'll almost finished with my work on the demangler, hopefully I'll be
done within a few weeks. Once that's all finished I'll look into
exporting the AST and comparing it to FastDemangle. I was thinking
about adding a version of llvm::itaniumMangle() that returns a opaque
handle to the AST and defining some functions on the LLVM side that
take that handle and return some extra information. I'd be happy to
help out with the LLDB side of things too, although it might be
better if someone more experienced with LLDB did this.
Can't wait! The only reason we switched away from the libcxxabi
demangler in the first place was the poor performance. GDB's demangler
was 3x faster. Our FastDemangler made got back to the speed of the GDB
demangler. But it will be great to get back to one fast demangler.
It would be great if there was some way to implement the demangled
name size cutoff in the demangler where if the detangled names goes
over some max size we can just stop demangling. No one needs to see a
72MB string, not would anyone ever type in that name.
If you can get the new demangler features (AST + demangling) into
llvm::itaniumMangle I will be happy to do the LLDB side of the work
I'll ping this thread when I'm finished with the demangler, then we
can hopefully work o