Re: [lldb-dev] Huge mangled names are causing long delays when loading symbol table symbols

Greg Clayton via lldb-dev Fri, 26 Jan 2018 12:57:32 -0800

> On Jan 26, 2018, at 8:38 AM, Erik Pilkington <[email protected]> 
> wrote:
> 
> 
> 
> On 2018-01-25 1:58 PM, Greg Clayton wrote:
>>> On Jan 25, 2018, at 10:25 AM, Erik Pilkington <[email protected]> 
>>> wrote:
>>> 
>>> Hi,
>>> I'm not at all familiar with LLDB, but I've been doing some work on the 
>>> demangler in libcxxabi. It's still a work in progress and I haven't yet 
>>> copied the changes over to ItaniumDemangle, which AFAIK is what lldb uses. 
>>> The demangler in libcxxabi now demangles the symbol you attached in 3.31 
>>> seconds, instead of 223.54 on my machine. I posted a RFC on my work here 
>>> (http://lists.llvm.org/pipermail/llvm-dev/2017-June/114448.html), but 
>>> basically the new demangler just produces an AST then traverses it to print 
>>> the demangled name.
>> Great to hear the huge speedup in demangling! LLDB actually has two 
>> demanglers: a fast one that can demangle 99% of names, and we fall back to 
>> ItaniumDemangle which can do all names but is really slow. It would be fun 
>> to compare your new demangler with the fast one and see if we can get rid of 
>> the fast demangler now.
>>> 
>>> I think a good way of making this even faster is to have LLDB consume the 
>>> AST the demangler produces directly. The AST is a better representation of 
>>> the information that LLDB wants, and finishing the demangle and then 
>>> fishing out that information from the output string is unfortunate. From 
>>> the AST, it would be really straightforward to just individually print all 
>>> the components of the name that LLDB wants.
>> This would help us to grab the important bits out of the mangled name as 
>> well. We chop up a demangled name to find the base name (string for 
>> std::string), containing context (std:: for std::string) and we check if we 
>> can tell if the function is a method (look for trailing "const" modifier on 
>> the function) versus a top level function (since the mangling doesn't fully 
>> specify what is a namespace and what is a class (like in "foo::bar::baz()" 
>> we don't know if "foo" or "bar" are classes or namespaces. So the AST would 
>> be great as long as it is fast.
>> 
>>> Most of the time it takes to demangle these "symbols from hell" is during 
>>> the printing, after the AST has been parsed, because the demangler has to 
>>> flatten out all the potentially nested back references. Just parsing to an 
>>> AST should be about proportional to the strlen of the mangled name. Since 
>>> (AFAIK) LLDB doesn't use some sections of the demangled name often (such as 
>>> parameters), from the AST LLDB could lazily decide not to even bother fully 
>>> demangling some sections of the name, then if it ever needs them it could 
>>> parse a new AST and get them from there. I think this would largely fix the 
>>> issue, as most of the time these crazy expansions don't occur in the name 
>>> itself, but in the parameters or return type. Even when they do appear in 
>>> the name, it would be possible to do some simple name classification (ie, 
>>> does this symbol refer to a function) or pull out the basename quickly 
>>> without expanding anything at all.
>>> 
>>> Any thoughts? I'm really not at all familiar with LLDB, so I could have 
>>> this all wrong!
>> AST sounds great. We can put this into the class we use to chop us C++ names 
>> as that is really our goal.
>> 
>> So it would be great to do a speed comparison between our fast demangler in 
>> LLDB (in FastDemangle.cpp/.h) and your updated libcxxabi version. If yours 
>> is faster, remove FastDemangle and then update the llvm::ItaniumDemangle() 
>> to use your new code.
>> 
>> ASTs would be great for the C++ name parser,
>> 
>> Let us know what you are thinking,
> 
> Hi Greg,
> 
> I'll almost finished with my work on the demangler, hopefully I'll be done 
> within a few weeks. Once that's all finished I'll look into exporting the AST 
> and comparing it to FastDemangle. I was thinking about adding a version of 
> llvm::itaniumMangle() that returns a opaque handle to the AST and defining 
> some functions on the LLVM side that take that handle and return some extra 
> information. I'd be happy to help out with the LLDB side of things too, 
> although it might be better if someone more experienced with LLDB did this.
>


Can't wait! The only reason we switched away from the libcxxabi demangler in 
the first place was the poor performance. GDB's demangler was 3x faster. Our 
FastDemangler made got back to the speed of the GDB demangler. But it will be 
great to get back to one fast demangler. 

It would be great if there was some way to implement the demangled name size 
cutoff in the demangler where if the detangled names goes over some max size we 
can just stop demangling. No one needs to see a 72MB string, not would anyone 
ever type in that name.

If you can get the new demangler features (AST + demangling) into 
llvm::itaniumMangle I will be happy to do the LLDB side of the work

> I'll ping this thread when I'm finished with the demangler, then we can 
> hopefully work out what a good API for LLDB would be.

It would be great to put all the functionality into LLVM and test the 
functionality in llvm tests. Then I will port over to LLDB as needed. As Jim 
said, we want to know the function basename, if a function is a C++ method or 
just a top level function or possibly both (we often don't know just from 
mangling if foo::bar() is a method of function since we don't know if "foo" is 
a namespace, but if we have "foo::bar() const", then we know it is a method.

Look forward to seeing what you come up with!

Greg

> 
> Thanks,
> Erik
> 
>> Greg
>> 
>>> Thanks,
>>> Erik
>>> 
>>> 
>>> On 2018-01-24 6:48 PM, Greg Clayton via lldb-dev wrote:
>>>> I have an issue where I am debugging a C++ binary that is around 250MB in 
>>>> size. It contains some mangled names that are crazy:
>>>> 
>>>> _ZNK3shk6detail17CallbackPublisherIZNS_5ThrowERKNSt15__exception_ptr13exception_ptrEEUlOT_E_E9SubscribeINS0_9ConcatMapINS0_18CallbackSubscriberIZNS_6GetAllIiNS1_IZZNS_9ConcatMapIZNS_6ConcatIJNS1_IZZNS_3MapIZZNS_7IfEmptyIS9_EEDaS7_ENKUlS6_E_clINS1_IZZNS_4TakeIiEESI_S7_ENKUlS6_E_clINS1_IZZNS_6FilterIZNS_9ElementAtEmEUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZZNSL_ImEESI_S7_ENKUlS6_E_clINS1_IZNS_4FromINS0_22InfiniteRangeContainerIiEEEESI_S7_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EESI_S7_ENKUlS6_E_clIS14_EESI_S6_EUlS7_E_EERNS1_IZZNSH_IS9_EESI_S7_ENKSK_IS14_EESI_S6_EUlS7_E0_EEEEESI_DpOT_EUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZNS_5StartIJZNS_4JustIJS19_S1C_EEESI_S1F_EUlvE_ZNS1K_IJS19_S1C_EEESI_S1F_EUlvE0_EEESI_S1F_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESt6vectorIS6_SaIS6_EERKT0_NS_12ElementCountEbEUlS7_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlOS3_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlvE_EES1G_S1O_E25ConcatMapValuesSubscriberEEEDaS7_
>>>> 
>>>> This de-mangles to something that is 72MB in size and takes 280 seconds 
>>>> (try running "time c++filt -n" on the above string).
>>>> 
>>>> There are probably many symbols likes this in this binary. Currently lldb 
>>>> will de-mangle all names in the symbol table so that we can chop up the 
>>>> names so we know function base names and we might be able to classify a 
>>>> base name as a method or function for breakpoint categorization.
>>>> 
>>>> My questions is: how do we work around such issues in LLDB? A few 
>>>> solutions I can think of:
>>>> 1 - time each name demangle and if it takes too long somehow stop 
>>>> de-mangling similar symbols or symbols over a certain length?
>>>> 2 - allow a setting that says "don't de-mangle names that start with..." 
>>>> and the setting has a list of prefixes.
>>>> 3 - have a setting that turns off de-mangling symbols over a certain 
>>>> length all of the time with a default of something like 256 or 512
>>>> 4 - modify our FastDemangler to abort if the de-mangled string goes over a 
>>>> certain limit to avoid bad cases like this...
>>>> 
>>>> #1 would still mean we get a huge delay (like 280 seconds) when starting 
>>>> to debug this binary, but might prevent multiple symbols from adding to 
>>>> that delay...
>>>> 
>>>> #2 would require debugging debugging once and then knowing which symbols 
>>>> took a while to de-mangle. If we time each de-mangle, we can warn that 
>>>> there are large mangled names and print the mangled name so the user might 
>>>> know?
>>>> 
>>>> #3 would disable de-mangling of long names at the risk of not de-mangling 
>>>> names that are close to the limit
>>>> 
>>>> #4 requires that our FastDemangle code can decode the string mangled 
>>>> string. The fast de-mangler currently aborts on tricky de-mangling and we 
>>>> fall back onto cxa_demangle from the C++ library which doesn't not have a 
>>>> cutoff on length...
>>>> 
>>>> Can anyone else think of any other solutions?
>>>> 
>>>> Greg Clayton
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> lldb-dev mailing list
>>>> [email protected]
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

_______________________________________________
lldb-dev mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] Huge mangled names are causing long delays when loading symbol table symbols

Reply via email to