Apologies for the delay, the email fell through the cracks somehow. The updated patch looks like it would work alright, only needs a couple tests, e.g.: https://github.com/rust-lang/rustc-demangle/blob/2811a1ad6f7c8bead2ef3671e4fdc10de1553e96/src/lib.rs#L413-L422 https://github.com/rust-lang/rustc-demangle/blob/2811a1ad6f7c8bead2ef3671e4fdc10de1553e96/src/v0.rs#L1442-L1444
Thanks, - Eddy B. On Tue, Dec 7, 2021, at 21:16, Mark Wielaard wrote: > Hi Eddy, > > On Fri, 2021-12-03 at 01:14 +0200, Eduard-Mihai Burtescu wrote: >> On Fri, Dec 3, 2021, at 00:07, Mark Wielaard wrote: >> > On Thu, Dec 02, 2021 at 07:35:17PM +0200, Eduard-Mihai Burtescu >> > wrote: >> > > That also means that for consistency, suffixes like these should >> > > be >> > > handled uniformly for both v0 and legacy (as rustc-demangle >> > > does), >> > > since LLVM doesn't distinguish. >> > >> > The problem with the legacy mangling is that dot '.' is a valid >> > character. That is why the patch only handles the v0 mangling case >> > (where dot '.' isn't valid). >> >> Thought so, that's an annoying complication - however, see later down >> why that's still not a blocker to the way rustc-demangle handles it. >> >> > > You may even be able to get Clang to generate C++ mangled symbols >> > > with ".llvm." suffixes, with enough application of LTO. This is >> > > not >> > > unlike GCC ".clone" suffixes, AFAIK. Sadly I don't think there's >> > > a >> > > way to handle both as "outside the symbol", without hardcoding >> > > ".llvm." in the implementation. >> > >> > We could use the scheme used by c++ where the .suffix is added as " >> > [clone .suffix]", it even handles multiple dots, where something >> > like >> > _Z3fooi.part.9.165493.constprop.775.31805 >> > demangles to >> > foo(int) [clone .part.9.165493] [clone .constprop.775.31805] >> > >> > I just don't think that is very useful and a little confusing. >> >> Calling it "clone" is a bit weird, but I just checked what rustc- >> demangle >> does for printing suffixes back out and it's not great either: >> - ".llvm." (and everything after it) is completely removed >> - any left-over suffixes (after demangling), if they start with ".", >> are >> not considered errors, but printed out verbatim after the >> demangling >> >> > > I don't recall the libiberty demangling API having any provisions >> > > for the demangler deciding that a mangled symbol "stops early", >> > > which would maybe allow for a more general solution. >> > >> > No, there indeed is no interface. We might introduce a new option >> > flag >> > for treating '.' as end of symbol. But do we really need that >> > flexibility? >> >> That's not what I meant - a v0 or legacy symbol is self-terminating >> in >> its parsing (or at the very least there are not dots allowed outside >> of a length-prefixed identifier), so that when you see the start of >> a valid mangled symbol, you can always find its end in the string, >> even when that end is half-way through (and is followed by suffixes >> or any other unrelated noise). >> >> What I was imagining is a way to return to the caller the number of >> chars from the start of the original string, that were demangled, >> letting the caller do something else with the rest of that string. >> (see below for how rustc-demangle already does something similar) >> >> > > Despite all that, if it helps in practice, I would still not mind >> > > this patch landing in its current form, I just wanted to share my >> > > perspective on the larger issue. >> > >> > Thanks for that. Do you happen to know what other rust demanglers >> > do? >> >> rustc-demangle's internal API returns a pair of the demangler and the >> "leftover" parts of the original string, after the end of the symbol. >> You can see here how that suffix is further checked, and kept: >> https://github.com/rust-lang/rustc-demangle/blob/2811a1ad6f7c8bead2ef3671e4fdc10de1553e96/src/lib.rs#L108-L138 > > Yes, returning a struct that returns the style detected, the demangled > string and the left over chars makes sense. But we don't have an > interface like that at the moment, and I am not sure we (currently) > have users who want this. > >> As mentioned above, ".llvm." is handled differently, just above the snippet >> linked - perhaps it was deemed too common to let it pollute the output. > > But that also makes it a slightly odd interface. I would imagine that > people would be interested in the .llvm. part. Now that just gets > dropped. > > Since we don't have an interface to return the suffix (and I find the > choice of dropping .llvm. but not other suffixes odd), I think we > should just simply always drop the .suffix. I understand now how to do > that for legacy symbols too, thanks for the hints. > > See the attached update to the patch. What do you think? > > Thanks, > > Mark > > Attachments: > * 0001-libiberty-rust-demangle-ignore-.suffix.patch
