https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119387
--- Comment #19 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Richard Biener from comment #18) > trunk is now faster than GCC 13, using about the same amount of memory > without -g and only slightly more with -g. I consider the regression fixed > on trunk sofar. > > It also hides the GC allocation issue somewhat. The profile is now dominated > by the C++ FE: > > Samples: 130K of event 'cycles:P', Event count (approx.): 137748911889 > > Overhead Samples Command Shared Object Symbol > > 17.11% 22269 cc1plus cc1plus [.] > find_substitution(tree_node*) Some obvious improvement to /* Now check the list of available substitutions for this mangling operation. */ if (!abbr || tags) for (i = 0; i < size; ++i) if (tree candidate = (*G.substitutions)[i]) { /* NODE is a matched to a candidate if it's the same decl node or if it's the same type. */ if (decl == candidate || (TYPE_P (candidate) && type && TYPE_P (node) && same_type_p (type, candidate)) || NESTED_TEMPLATE_MATCH (node, candidate)) { write_substitution (i); return 1; } } where all the time is spent would be to separate TYPE_P substitutions from non-TYPE_P and/or "unswitch" the linear search over all candidates based on type && TYPE_P (node). Just performing the unswitching improves compile-time by ~3%.