Re: [lldb-dev] Huge mangled names are causing long delays when loading symbol table symbols

2018-01-25 Thread Pavel Labath via lldb-dev
The mangled name length threshold would be the easiest to implement.
However, I fear we may not be able to find a good cutoff length,
because it's not the length of it that matters, but the number (and
recursiveness) of back-references. For example, I was able to find a
mangled name of 757 characters in lldb:
_ZN12lldb_private23ScriptInterpreterPython21InitializeInterpreterEPFvvEPFbPKcS4_RKSt10shared_ptrINS_10StackFrameEERKS5_INS_18BreakpointLocationEEEPFbS4_S4_S9_RKS5_INS_10WatchpointEEEPFbS4_PvRKNS_10SharingPtrINS_11ValueObjectEEEPSM_RKS5_INS_18TypeSummaryOptionsEERNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcPFSM_S4_S4_SR_EPFSM_S4_S4_S5_INS_8DebuggerEEEPFmSM_jEPFSM_SM_jEPFiSM_S4_EPFSM_SM_EPFSP_SM_EPFbSM_ES1N_S1J_PFbS4_S4_RS19_S4_RNS_19CommandReturnObjectES5_INS_19ExecutionContextRefEEEPFbSM_S1O_S4_S1Q_S1S_EPFbS4_S4_S1O_EPFSM_S4_S4_RKS5_INS_7ProcessEEEPFbS4_S4_RS20_S13_EPFbS4_S4_RS5_INS_6ThreadEES13_EPFbS4_S4_RS5_INS_6TargetEES13_EPFbS4_S4_RS7_S13_EPFbS4_S4_RSP_S13_EPFSM_SM_S4_RKS2E_EPFSM_S4_S4_RKS5_INS_10ThreadPlanEEEPFbSM_S4_PNS_5EventERbE

This demangles string of lenght 2534 and I think it would be good to
handle it. On the other hand, I was able to produce a mangled name of
only 168 characters:
_ZN1BIS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IiiES0_ES1_ES2_ES3_ES4_ES5_ES6_ES7_ES8_ES9_ESA_ESB_ESC_ESD_ESE_ESF_ESG_ESH_ESI_ESJ_ESK_ESL_E1fEv
which demanges to a 70MB string. (It takes about 3 seconds to compile
a file with this symbol and 0.8s to demangle it).

So we may need limit the on the output buffer size instead, but this
will require cooperation from the demangling library. Fortunately, all
targets nowadays use either the "fast" demangler or
llvm::itaniumDemangle by default, which we can modify to add a
threshold like this.

pl



On 25 January 2018 at 00:17, Greg Clayton via lldb-dev
 wrote:
>
> On Jan 24, 2018, at 4:14 PM, Zachary Turner  wrote:
>
> That's true, but shouldn't it be possible to demangle up until the last
> point you got something meaningful?  (I don't know the details of itanium
> mangling, just assuming this is possible)
>
>
> anywhere you cut the string many things can go wrong. I think this would
> fall under the "start to demangle the string and if the output buffer goes
> over a certain length, abort the demangling which is solution #4 from my
> original email.
>
>
> On Wed, Jan 24, 2018 at 3:54 PM Greg Clayton  wrote:
>>
>> If you just cut off the string, then it might not demangle without an
>> error if you truncate the mangled string at a specific point...
>>
>> On Jan 24, 2018, at 3:52 PM, Zachary Turner  wrote:
>>
>> What about doing a partial demangle?   Take at most 1024 (for example)
>> characters from the mangled name, demangle that, and then display ... at the
>> end.
>>
>> On Wed, Jan 24, 2018 at 3:48 PM Greg Clayton via lldb-dev
>>  wrote:
>>>
>>> I have an issue where I am debugging a C++ binary that is around 250MB in
>>> size. It contains some mangled names that are crazy:
>>>
>>>
>>> _ZNK3shk6detail17CallbackPublisherIZNS_5ThrowERKNSt15__exception_ptr13exception_ptrEEUlOT_E_E9SubscribeINS0_9ConcatMapINS0_18CallbackSubscriberIZNS_6GetAllIiNS1_IZZNS_9ConcatMapIZNS_6ConcatIJNS1_IZZNS_3MapIZZNS_7IfEmptyIS9_EEDaS7_ENKUlS6_E_clINS1_IZZNS_4TakeIiEESI_S7_ENKUlS6_E_clINS1_IZZNS_6FilterIZNS_9ElementAtEmEUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZZNSL_ImEESI_S7_ENKUlS6_E_clINS1_IZNS_4FromINS0_22InfiniteRangeContainerIiSI_S7_EUlS7_E_SI_S6_EUlS7_E_SI_S6_EUlS7_E_SI_S6_EUlS7_E_SI_S6_EUlS7_E_EESI_S7_ENKUlS6_E_clIS14_EESI_S6_EUlS7_E_EERNS1_IZZNSH_IS9_EESI_S7_ENKSK_IS14_EESI_S6_EUlS7_E0_ESI_DpOT_EUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZNS_5StartIJZNS_4JustIJS19_S1C_EEESI_S1F_EUlvE_ZNS1K_IJS19_S1C_EEESI_S1F_EUlvE0_EEESI_S1F_EUlS7_E_SI_S6_EUlS7_E_St6vectorIS6_SaIS6_EERKT0_NS_12ElementCountEbEUlS7_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlOS3_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlvE_EES1G_S1O_E25ConcatMapValuesSubscriberEEEDaS7_
>>>
>>> This de-mangles to something that is 72MB in size and takes 280 seconds
>>> (try running "time c++filt -n" on the above string).
>>>
>>> There are probably many symbols likes this in this binary. Currently lldb
>>> will de-mangle all names in the symbol table so that we can chop up the
>>> names so we know function base names and we might be able to classify a base
>>> name as a method or function for breakpoint categorization.
>>>
>>> My questions is: how do we work around such issues in LLDB? A few
>>> solutions I can think of:
>>> 1 - time each name demangle and if it takes too long somehow stop
>>> de-mangling similar symbols or symbols over a certain length?
>>> 2 - allow a setting that says "don't de-mangle names that start with..."
>>> and the setting has a list of prefixes.
>>> 3 - have a setting that turns off de-mangling symbols over a certain
>>> length all of the time with a default of something like 256 or 512
>>> 4 - modify our FastDemangler to abort if the de-mangled string goes over
>>> a certain limit

Re: [lldb-dev] Huge mangled names are causing long delays when loading symbol table symbols

2018-01-25 Thread Erik Pilkington via lldb-dev

Hi,
I'm not at all familiar with LLDB, but I've been doing some work on the 
demangler in libcxxabi. It's still a work in progress and I haven't yet 
copied the changes over to ItaniumDemangle, which AFAIK is what lldb 
uses. The demangler in libcxxabi now demangles the symbol you attached 
in 3.31 seconds, instead of 223.54 on my machine. I posted a RFC on my 
work here 
(http://lists.llvm.org/pipermail/llvm-dev/2017-June/114448.html), but 
basically the new demangler just produces an AST then traverses it to 
print the demangled name.


I think a good way of making this even faster is to have LLDB consume 
the AST the demangler produces directly. The AST is a better 
representation of the information that LLDB wants, and finishing the 
demangle and then fishing out that information from the output string is 
unfortunate. From the AST, it would be really straightforward to just 
individually print all the components of the name that LLDB wants.


Most of the time it takes to demangle these "symbols from hell" is 
during the printing, after the AST has been parsed, because the 
demangler has to flatten out all the potentially nested back references. 
Just parsing to an AST should be about proportional to the strlen of the 
mangled name. Since (AFAIK) LLDB doesn't use some sections of the 
demangled name often (such as parameters), from the AST LLDB could 
lazily decide not to even bother fully demangling some sections of the 
name, then if it ever needs them it could parse a new AST and get them 
from there. I think this would largely fix the issue, as most of the 
time these crazy expansions don't occur in the name itself, but in the 
parameters or return type. Even when they do appear in the name, it 
would be possible to do some simple name classification (ie, does this 
symbol refer to a function) or pull out the basename quickly without 
expanding anything at all.


Any thoughts? I'm really not at all familiar with LLDB, so I could have 
this all wrong!


Thanks,
Erik


On 2018-01-24 6:48 PM, Greg Clayton via lldb-dev wrote:

I have an issue where I am debugging a C++ binary that is around 250MB in size. 
It contains some mangled names that are crazy:

_ZNK3shk6detail17CallbackPublisherIZNS_5ThrowERKNSt15__exception_ptr13exception_ptrEEUlOT_E_E9SubscribeINS0_9ConcatMapINS0_18CallbackSubscriberIZNS_6GetAllIiNS1_IZZNS_9ConcatMapIZNS_6ConcatIJNS1_IZZNS_3MapIZZNS_7IfEmptyIS9_EEDaS7_ENKUlS6_E_clINS1_IZZNS_4TakeIiEESI_S7_ENKUlS6_E_clINS1_IZZNS_6FilterIZNS_9ElementAtEmEUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZZNSL_ImEESI_S7_ENKUlS6_E_clINS1_IZNS_4FromINS0_22InfiniteRangeContainerIiSI_S7_EUlS7_E_SI_S6_EUlS7_E_SI_S6_EUlS7_E_SI_S6_EUlS7_E_SI_S6_EUlS7_E_EESI_S7_ENKUlS6_E_clIS14_EESI_S6_EUlS7_E_EERNS1_IZZNSH_IS9_EESI_S7_ENKSK_IS14_EESI_S6_EUlS7_E0_ESI_DpOT_EUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZNS_5StartIJZNS_4JustIJS19_S1C_EEESI_S1F_EUlvE_ZNS1K_IJS19_S1C_EEESI_S1F_EUlvE0_EEESI_S1F_EUlS7_E_SI_S6_EUlS7_E_St6vectorIS6_SaIS6_EERKT0_NS_12ElementCountEbEUlS7_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlOS3_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlvE_EES1G_S1O_E25ConcatMapValuesSubscriberEEEDaS7_

This de-mangles to something that is 72MB in size and takes 280 seconds (try running 
"time c++filt -n" on the above string).

There are probably many symbols likes this in this binary. Currently lldb will 
de-mangle all names in the symbol table so that we can chop up the names so we 
know function base names and we might be able to classify a base name as a 
method or function for breakpoint categorization.

My questions is: how do we work around such issues in LLDB? A few solutions I 
can think of:
1 - time each name demangle and if it takes too long somehow stop de-mangling 
similar symbols or symbols over a certain length?
2 - allow a setting that says "don't de-mangle names that start with..." and 
the setting has a list of prefixes.
3 - have a setting that turns off de-mangling symbols over a certain length all 
of the time with a default of something like 256 or 512
4 - modify our FastDemangler to abort if the de-mangled string goes over a 
certain limit to avoid bad cases like this...

#1 would still mean we get a huge delay (like 280 seconds) when starting to 
debug this binary, but might prevent multiple symbols from adding to that 
delay...

#2 would require debugging debugging once and then knowing which symbols took a 
while to de-mangle. If we time each de-mangle, we can warn that there are large 
mangled names and print the mangled name so the user might know?

#3 would disable de-mangling of long names at the risk of not de-mangling names 
that are close to the limit

#4 requires that our FastDemangle code can decode the string mangled string. 
The fast de-mangler currently aborts on tricky de-mangling and we fall back 
onto cxa_demangle from the C++ library which doesn't not have a cutoff on 
length...

Can anyone else think of any other solutions?

Greg Clayton






__

Re: [lldb-dev] Huge mangled names are causing long delays when loading symbol table symbols

2018-01-25 Thread Jim Ingham via lldb-dev
I must admit I've never played around with C++ demangling, but I wonder if our 
purposes in demangling might inform how we do this?

We use demangled names for a couple of purposes.  One is to print names in 
backtraces and thread reporting when we stop.  For the most part the requests 
we've gotten for this is that the full demangled names are too noisy and 
impossible to read and we need to cut them down for usability's sake.  For 
instance, we added a display mode to the swift demangler so that backtraces 
were actually useful.  But in any case, this part can be done lazily when a 
name shows up in a backtrace, and so is not so performance sensitive.

The other reason we use them is to allow the various name lookups to work with 
human-level names (often partially specialized) and find their way to the 
actual symbols.  This is generally why we have to do mass demangling of symbols 
when we read in a module.  Having a full demangled name here does allow folks 
to specify a particular overload (for setting breakpoints, etc.) but that part 
of our symbol lookups is more frustrating than helpful because you have to know 
pretty much exactly how the compiler spelled the demangled name, at which point 
it's generally easier just to use the mangled name.

So I wonder if it wouldn't be possible to make a demangle that doesn't attempt 
full fidelity, but rather is crafted to pick out the pieces that we actually 
need and use to do heuristic name matches, and then we could use the faithful 
demangler when we are intentionally presenting a name - at which point the 
speed will be much less important.

I'm probably missing some uses of demangled names that might not make this 
possible, but it seems worth considering.

Jim

> On Jan 25, 2018, at 2:40 AM, Pavel Labath via lldb-dev 
>  wrote:
> 
> The mangled name length threshold would be the easiest to implement.
> However, I fear we may not be able to find a good cutoff length,
> because it's not the length of it that matters, but the number (and
> recursiveness) of back-references. For example, I was able to find a
> mangled name of 757 characters in lldb:
> _ZN12lldb_private23ScriptInterpreterPython21InitializeInterpreterEPFvvEPFbPKcS4_RKSt10shared_ptrINS_10StackFrameEERKS5_INS_18BreakpointLocationEEEPFbS4_S4_S9_RKS5_INS_10WatchpointEEEPFbS4_PvRKNS_10SharingPtrINS_11ValueObjectEEEPSM_RKS5_INS_18TypeSummaryOptionsEERNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcPFSM_S4_S4_SR_EPFSM_S4_S4_S5_INS_8DebuggerEEEPFmSM_jEPFSM_SM_jEPFiSM_S4_EPFSM_SM_EPFSP_SM_EPFbSM_ES1N_S1J_PFbS4_S4_RS19_S4_RNS_19CommandReturnObjectES5_INS_19ExecutionContextRefEEEPFbSM_S1O_S4_S1Q_S1S_EPFbS4_S4_S1O_EPFSM_S4_S4_RKS5_INS_7ProcessEEEPFbS4_S4_RS20_S13_EPFbS4_S4_RS5_INS_6ThreadEES13_EPFbS4_S4_RS5_INS_6TargetEES13_EPFbS4_S4_RS7_S13_EPFbS4_S4_RSP_S13_EPFSM_SM_S4_RKS2E_EPFSM_S4_S4_RKS5_INS_10ThreadPlanEEEPFbSM_S4_PNS_5EventERbE
> 
> This demangles string of lenght 2534 and I think it would be good to
> handle it. On the other hand, I was able to produce a mangled name of
> only 168 characters:
> _ZN1BIS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IS_IiiES0_ES1_ES2_ES3_ES4_ES5_ES6_ES7_ES8_ES9_ESA_ESB_ESC_ESD_ESE_ESF_ESG_ESH_ESI_ESJ_ESK_ESL_E1fEv
> which demanges to a 70MB string. (It takes about 3 seconds to compile
> a file with this symbol and 0.8s to demangle it).
> 
> So we may need limit the on the output buffer size instead, but this
> will require cooperation from the demangling library. Fortunately, all
> targets nowadays use either the "fast" demangler or
> llvm::itaniumDemangle by default, which we can modify to add a
> threshold like this.
> 
> pl
> 
> 
> 
> On 25 January 2018 at 00:17, Greg Clayton via lldb-dev
>  wrote:
>> 
>> On Jan 24, 2018, at 4:14 PM, Zachary Turner  wrote:
>> 
>> That's true, but shouldn't it be possible to demangle up until the last
>> point you got something meaningful?  (I don't know the details of itanium
>> mangling, just assuming this is possible)
>> 
>> 
>> anywhere you cut the string many things can go wrong. I think this would
>> fall under the "start to demangle the string and if the output buffer goes
>> over a certain length, abort the demangling which is solution #4 from my
>> original email.
>> 
>> 
>> On Wed, Jan 24, 2018 at 3:54 PM Greg Clayton  wrote:
>>> 
>>> If you just cut off the string, then it might not demangle without an
>>> error if you truncate the mangled string at a specific point...
>>> 
>>> On Jan 24, 2018, at 3:52 PM, Zachary Turner  wrote:
>>> 
>>> What about doing a partial demangle?   Take at most 1024 (for example)
>>> characters from the mangled name, demangle that, and then display ... at the
>>> end.
>>> 
>>> On Wed, Jan 24, 2018 at 3:48 PM Greg Clayton via lldb-dev
>>>  wrote:
 
 I have an issue where I am debugging a C++ binary that is around 250MB in
 size. It contains some mangled names that are crazy:
 
 
 _ZNK3shk6detail17CallbackPublisherIZNS_5ThrowERKNSt15__exception_ptr

Re: [lldb-dev] Huge mangled names are causing long delays when loading symbol table symbols

2018-01-25 Thread Jim Ingham via lldb-dev
specialized -> specified

Jim


> On Jan 25, 2018, at 10:30 AM, Jim Ingham via lldb-dev 
>  wrote:
> 
> specialized

___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Huge mangled names are causing long delays when loading symbol table symbols

2018-01-25 Thread Jim Ingham via lldb-dev
That's along the same lines as what I was thinking.  We really don't need to 
print all these names, and in fact the complicated ones are not useful for 
printing and certainly there are few times where you want to use them in their 
explicit forms.  We really just want to pick out pieces to put in our names 
tables for lookup.  So if we can get them in some kind of node form and then 
pull the bits we want out that might be a better way to go.

Jim


> On Jan 25, 2018, at 10:25 AM, Erik Pilkington via lldb-dev 
>  wrote:
> 
> Hi,
> I'm not at all familiar with LLDB, but I've been doing some work on the 
> demangler in libcxxabi. It's still a work in progress and I haven't yet 
> copied the changes over to ItaniumDemangle, which AFAIK is what lldb uses. 
> The demangler in libcxxabi now demangles the symbol you attached in 3.31 
> seconds, instead of 223.54 on my machine. I posted a RFC on my work here 
> (http://lists.llvm.org/pipermail/llvm-dev/2017-June/114448.html), but 
> basically the new demangler just produces an AST then traverses it to print 
> the demangled name.
> 
> I think a good way of making this even faster is to have LLDB consume the AST 
> the demangler produces directly. The AST is a better representation of the 
> information that LLDB wants, and finishing the demangle and then fishing out 
> that information from the output string is unfortunate. From the AST, it 
> would be really straightforward to just individually print all the components 
> of the name that LLDB wants.
> 
> Most of the time it takes to demangle these "symbols from hell" is during the 
> printing, after the AST has been parsed, because the demangler has to flatten 
> out all the potentially nested back references. Just parsing to an AST should 
> be about proportional to the strlen of the mangled name. Since (AFAIK) LLDB 
> doesn't use some sections of the demangled name often (such as parameters), 
> from the AST LLDB could lazily decide not to even bother fully demangling 
> some sections of the name, then if it ever needs them it could parse a new 
> AST and get them from there. I think this would largely fix the issue, as 
> most of the time these crazy expansions don't occur in the name itself, but 
> in the parameters or return type. Even when they do appear in the name, it 
> would be possible to do some simple name classification (ie, does this symbol 
> refer to a function) or pull out the basename quickly without expanding 
> anything at all.
> 
> Any thoughts? I'm really not at all familiar with LLDB, so I could have this 
> all wrong!
> 
> Thanks,
> Erik
> 
> 
> On 2018-01-24 6:48 PM, Greg Clayton via lldb-dev wrote:
>> I have an issue where I am debugging a C++ binary that is around 250MB in 
>> size. It contains some mangled names that are crazy:
>> 
>> _ZNK3shk6detail17CallbackPublisherIZNS_5ThrowERKNSt15__exception_ptr13exception_ptrEEUlOT_E_E9SubscribeINS0_9ConcatMapINS0_18CallbackSubscriberIZNS_6GetAllIiNS1_IZZNS_9ConcatMapIZNS_6ConcatIJNS1_IZZNS_3MapIZZNS_7IfEmptyIS9_EEDaS7_ENKUlS6_E_clINS1_IZZNS_4TakeIiEESI_S7_ENKUlS6_E_clINS1_IZZNS_6FilterIZNS_9ElementAtEmEUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZZNSL_ImEESI_S7_ENKUlS6_E_clINS1_IZNS_4FromINS0_22InfiniteRangeContainerIiSI_S7_EUlS7_E_SI_S6_EUlS7_E_SI_S6_EUlS7_E_SI_S6_EUlS7_E_SI_S6_EUlS7_E_EESI_S7_ENKUlS6_E_clIS14_EESI_S6_EUlS7_E_EERNS1_IZZNSH_IS9_EESI_S7_ENKSK_IS14_EESI_S6_EUlS7_E0_ESI_DpOT_EUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZNS_5StartIJZNS_4JustIJS19_S1C_EEESI_S1F_EUlvE_ZNS1K_IJS19_S1C_EEESI_S1F_EUlvE0_EEESI_S1F_EUlS7_E_SI_S6_EUlS7_E_St6vectorIS6_SaIS6_EERKT0_NS_12ElementCountEbEUlS7_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlOS3_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlvE_EES1G_S1O_E25ConcatMapValuesSubscriberEEEDaS7_
>> 
>> This de-mangles to something that is 72MB in size and takes 280 seconds (try 
>> running "time c++filt -n" on the above string).
>> 
>> There are probably many symbols likes this in this binary. Currently lldb 
>> will de-mangle all names in the symbol table so that we can chop up the 
>> names so we know function base names and we might be able to classify a base 
>> name as a method or function for breakpoint categorization.
>> 
>> My questions is: how do we work around such issues in LLDB? A few solutions 
>> I can think of:
>> 1 - time each name demangle and if it takes too long somehow stop 
>> de-mangling similar symbols or symbols over a certain length?
>> 2 - allow a setting that says "don't de-mangle names that start with..." and 
>> the setting has a list of prefixes.
>> 3 - have a setting that turns off de-mangling symbols over a certain length 
>> all of the time with a default of something like 256 or 512
>> 4 - modify our FastDemangler to abort if the de-mangled string goes over a 
>> certain limit to avoid bad cases like this...
>> 
>> #1 would still mean we get a huge delay (like 280 seconds) when starting to 
>> debug this binary, but might prevent multiple symbols from adding to that 
>> delay...
>> 
>>

Re: [lldb-dev] Huge mangled names are causing long delays when loading symbol table symbols

2018-01-25 Thread Greg Clayton via lldb-dev

> On Jan 25, 2018, at 10:25 AM, Erik Pilkington  
> wrote:
> 
> Hi,
> I'm not at all familiar with LLDB, but I've been doing some work on the 
> demangler in libcxxabi. It's still a work in progress and I haven't yet 
> copied the changes over to ItaniumDemangle, which AFAIK is what lldb uses. 
> The demangler in libcxxabi now demangles the symbol you attached in 3.31 
> seconds, instead of 223.54 on my machine. I posted a RFC on my work here 
> (http://lists.llvm.org/pipermail/llvm-dev/2017-June/114448.html), but 
> basically the new demangler just produces an AST then traverses it to print 
> the demangled name.

Great to hear the huge speedup in demangling! LLDB actually has two demanglers: 
a fast one that can demangle 99% of names, and we fall back to ItaniumDemangle 
which can do all names but is really slow. It would be fun to compare your new 
demangler with the fast one and see if we can get rid of the fast demangler 
now. 
> 
> 
> I think a good way of making this even faster is to have LLDB consume the AST 
> the demangler produces directly. The AST is a better representation of the 
> information that LLDB wants, and finishing the demangle and then fishing out 
> that information from the output string is unfortunate. From the AST, it 
> would be really straightforward to just individually print all the components 
> of the name that LLDB wants.

This would help us to grab the important bits out of the mangled name as well. 
We chop up a demangled name to find the base name (string for std::string), 
containing context (std:: for std::string) and we check if we can tell if the 
function is a method (look for trailing "const" modifier on the function) 
versus a top level function (since the mangling doesn't fully specify what is a 
namespace and what is a class (like in "foo::bar::baz()" we don't know if "foo" 
or "bar" are classes or namespaces. So the AST would be great as long as it is 
fast.

> Most of the time it takes to demangle these "symbols from hell" is during the 
> printing, after the AST has been parsed, because the demangler has to flatten 
> out all the potentially nested back references. Just parsing to an AST should 
> be about proportional to the strlen of the mangled name. Since (AFAIK) LLDB 
> doesn't use some sections of the demangled name often (such as parameters), 
> from the AST LLDB could lazily decide not to even bother fully demangling 
> some sections of the name, then if it ever needs them it could parse a new 
> AST and get them from there. I think this would largely fix the issue, as 
> most of the time these crazy expansions don't occur in the name itself, but 
> in the parameters or return type. Even when they do appear in the name, it 
> would be possible to do some simple name classification (ie, does this symbol 
> refer to a function) or pull out the basename quickly without expanding 
> anything at all.
> 
> Any thoughts? I'm really not at all familiar with LLDB, so I could have this 
> all wrong!

AST sounds great. We can put this into the class we use to chop us C++ names as 
that is really our goal.

So it would be great to do a speed comparison between our fast demangler in 
LLDB (in FastDemangle.cpp/.h) and your updated libcxxabi version. If yours is 
faster, remove FastDemangle and then update the llvm::ItaniumDemangle() to use 
your new code.

ASTs would be great for the C++ name parser,

Let us know what you are thinking,

Greg

> 
> Thanks,
> Erik
> 
> 
> On 2018-01-24 6:48 PM, Greg Clayton via lldb-dev wrote:
>> I have an issue where I am debugging a C++ binary that is around 250MB in 
>> size. It contains some mangled names that are crazy:
>> 
>> _ZNK3shk6detail17CallbackPublisherIZNS_5ThrowERKNSt15__exception_ptr13exception_ptrEEUlOT_E_E9SubscribeINS0_9ConcatMapINS0_18CallbackSubscriberIZNS_6GetAllIiNS1_IZZNS_9ConcatMapIZNS_6ConcatIJNS1_IZZNS_3MapIZZNS_7IfEmptyIS9_EEDaS7_ENKUlS6_E_clINS1_IZZNS_4TakeIiEESI_S7_ENKUlS6_E_clINS1_IZZNS_6FilterIZNS_9ElementAtEmEUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZZNSL_ImEESI_S7_ENKUlS6_E_clINS1_IZNS_4FromINS0_22InfiniteRangeContainerIiSI_S7_EUlS7_E_SI_S6_EUlS7_E_SI_S6_EUlS7_E_SI_S6_EUlS7_E_SI_S6_EUlS7_E_EESI_S7_ENKUlS6_E_clIS14_EESI_S6_EUlS7_E_EERNS1_IZZNSH_IS9_EESI_S7_ENKSK_IS14_EESI_S6_EUlS7_E0_ESI_DpOT_EUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZNS_5StartIJZNS_4JustIJS19_S1C_EEESI_S1F_EUlvE_ZNS1K_IJS19_S1C_EEESI_S1F_EUlvE0_EEESI_S1F_EUlS7_E_SI_S6_EUlS7_E_St6vectorIS6_SaIS6_EERKT0_NS_12ElementCountEbEUlS7_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlOS3_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlvE_EES1G_S1O_E25ConcatMapValuesSubscriberEEEDaS7_
>> 
>> This de-mangles to something that is 72MB in size and takes 280 seconds (try 
>> running "time c++filt -n" on the above string).
>> 
>> There are probably many symbols likes this in this binary. Currently lldb 
>> will de-mangle all names in the symbol table so that we can chop up the 
>> names so we know function base names and we might be able to classify a base 
>> nam