Hi Eddy, On Fri, 2021-12-03 at 01:14 +0200, Eduard-Mihai Burtescu wrote: > On Fri, Dec 3, 2021, at 00:07, Mark Wielaard wrote: > > On Thu, Dec 02, 2021 at 07:35:17PM +0200, Eduard-Mihai Burtescu > > wrote: > > > That also means that for consistency, suffixes like these should > > > be > > > handled uniformly for both v0 and legacy (as rustc-demangle > > > does), > > > since LLVM doesn't distinguish. > > > > The problem with the legacy mangling is that dot '.' is a valid > > character. That is why the patch only handles the v0 mangling case > > (where dot '.' isn't valid). > > Thought so, that's an annoying complication - however, see later down > why that's still not a blocker to the way rustc-demangle handles it. > > > > You may even be able to get Clang to generate C++ mangled symbols > > > with ".llvm." suffixes, with enough application of LTO. This is > > > not > > > unlike GCC ".clone" suffixes, AFAIK. Sadly I don't think there's > > > a > > > way to handle both as "outside the symbol", without hardcoding > > > ".llvm." in the implementation. > > > > We could use the scheme used by c++ where the .suffix is added as " > > [clone .suffix]", it even handles multiple dots, where something > > like > > _Z3fooi.part.9.165493.constprop.775.31805 > > demangles to > > foo(int) [clone .part.9.165493] [clone .constprop.775.31805] > > > > I just don't think that is very useful and a little confusing. > > Calling it "clone" is a bit weird, but I just checked what rustc- > demangle > does for printing suffixes back out and it's not great either: > - ".llvm." (and everything after it) is completely removed > - any left-over suffixes (after demangling), if they start with ".", > are > not considered errors, but printed out verbatim after the > demangling > > > > I don't recall the libiberty demangling API having any provisions > > > for the demangler deciding that a mangled symbol "stops early", > > > which would maybe allow for a more general solution. > > > > No, there indeed is no interface. We might introduce a new option > > flag > > for treating '.' as end of symbol. But do we really need that > > flexibility? > > That's not what I meant - a v0 or legacy symbol is self-terminating > in > its parsing (or at the very least there are not dots allowed outside > of a length-prefixed identifier), so that when you see the start of > a valid mangled symbol, you can always find its end in the string, > even when that end is half-way through (and is followed by suffixes > or any other unrelated noise). > > What I was imagining is a way to return to the caller the number of > chars from the start of the original string, that were demangled, > letting the caller do something else with the rest of that string. > (see below for how rustc-demangle already does something similar) > > > > Despite all that, if it helps in practice, I would still not mind > > > this patch landing in its current form, I just wanted to share my > > > perspective on the larger issue. > > > > Thanks for that. Do you happen to know what other rust demanglers > > do? > > rustc-demangle's internal API returns a pair of the demangler and the > "leftover" parts of the original string, after the end of the symbol. > You can see here how that suffix is further checked, and kept: > https://github.com/rust-lang/rustc-demangle/blob/2811a1ad6f7c8bead2ef3671e4fdc10de1553e96/src/lib.rs#L108-L138
Yes, returning a struct that returns the style detected, the demangled string and the left over chars makes sense. But we don't have an interface like that at the moment, and I am not sure we (currently) have users who want this. > As mentioned above, ".llvm." is handled differently, just above the snippet > linked - perhaps it was deemed too common to let it pollute the output. But that also makes it a slightly odd interface. I would imagine that people would be interested in the .llvm. part. Now that just gets dropped. Since we don't have an interface to return the suffix (and I find the choice of dropping .llvm. but not other suffixes odd), I think we should just simply always drop the .suffix. I understand now how to do that for legacy symbols too, thanks for the hints. See the attached update to the patch. What do you think? Thanks, Mark
From b9ea4a1cc5d139a08fd04bd5c59dae1d1519f82b Mon Sep 17 00:00:00 2001 From: Mark Wielaard <m...@klomp.org> Date: Thu, 2 Dec 2021 18:00:39 +0100 Subject: [PATCH] libiberty rust-demangle, ignore .suffix Rust symbols can have a .suffix because of compiler transformations. These can be ignored in the demangled name. Which is what this patch implements. By stopping at the first dot for v0 symbols and searching backwards to the ending 'E' for legacy symbols. An alternative implementation could be to follow what C++ does and represent these as [clone .suffix] tagged onto the demangled name. But this seems somewhat confusing since it results in a demangled name that cannot be mangled again. And it would mean trying to decode compiler internal naming. https://bugs.kde.org/show_bug.cgi?id=445916 https://github.com/rust-lang/rust/issues/60705 libiberty/Changelog * rust-demangle.c (rust_demangle_callback): Ignore everything after '.' char in sym for v0. For legacy symbols search backwards to find the last 'E' before any '.'. * testsuite/rust-demangle-expected: Add new .suffix testcases. --- libiberty/rust-demangle.c | 15 ++++++++++++++- libiberty/testsuite/rust-demangle-expected | 9 +++++++++ 2 files changed, 23 insertions(+), 1 deletion(-) diff --git a/libiberty/rust-demangle.c b/libiberty/rust-demangle.c index 449941b56dc1..5c2b77f7a58e 100644 --- a/libiberty/rust-demangle.c +++ b/libiberty/rust-demangle.c @@ -1340,6 +1340,10 @@ rust_demangle_callback (const char *mangled, int options, /* Rust symbols (v0) use only [_0-9a-zA-Z] characters. */ for (p = rdm.sym; *p; p++) { + /* Rust v0 symbols can have '.' suffixes, ignore those. */ + if (rdm.version == 0 && *p == '.') + break; + rdm.sym_len++; if (*p == '_' || ISALNUM (*p)) @@ -1355,7 +1359,16 @@ rust_demangle_callback (const char *mangled, int options, /* Legacy Rust symbols need to be handled separately. */ if (rdm.version == -1) { - /* Legacy Rust symbols always end with E. */ + /* Legacy Rust symbols always end with E. But can be followed by a + .suffix (which we want to ignore). */ + int dot_suffix = 1; + while (rdm.sym_len > 0 && + !(dot_suffix && rdm.sym[rdm.sym_len - 1] == 'E')) + { + dot_suffix = rdm.sym[rdm.sym_len - 1] == '.'; + rdm.sym_len--; + } + if (!(rdm.sym_len > 0 && rdm.sym[rdm.sym_len - 1] == 'E')) return 0; rdm.sym_len--; diff --git a/libiberty/testsuite/rust-demangle-expected b/libiberty/testsuite/rust-demangle-expected index 7dca315d0054..eb73527b3189 100644 --- a/libiberty/testsuite/rust-demangle-expected +++ b/libiberty/testsuite/rust-demangle-expected @@ -295,3 +295,12 @@ _RMCs4fqI2P2rA04_13const_genericINtB0_4CharKc2202_E --format=auto _RNvNvMCs4fqI2P2rA04_13const_genericINtB4_3FooKpE3foo3FOO <const_generic::Foo<_>>::foo::FOO +# +# Suffixes +# +--format=rust +_RNvMs0_NtCs5l0EXMQXRMU_21rustc_data_structures17obligation_forestINtB5_16ObligationForestNtNtNtCsdozMG8X9FIu_21rustc_trait_selection6traits7fulfill26PendingPredicateObligationE22register_obligation_atB1v_.llvm.8517020237817239694 +<rustc_data_structures::obligation_forest::ObligationForest<rustc_trait_selection::traits::fulfill::PendingPredicateObligation>>::register_obligation_at +--format=rust +_ZN4core3ptr85drop_in_place$LT$std..rt..lang_start$LT$$LP$$RP$$GT$..$u7b$$u7b$closure$u7d$$u7d$$GT$17h27f14859c664490dE.llvm.8091179795805947855 +core::ptr::drop_in_place<std::rt::lang_start<()>::{{closure}}> \ No newline at end of file -- 2.18.4