On Sat, 2017-05-06 at 13:30 +0200, Milian Wolff wrote: > On Freitag, 5. Mai 2017 15:06:48 CEST Mark Wielaard wrote: > > On Thu, 2017-05-04 at 18:05 +0200, Milian Wolff wrote: > > > I noticed that elfutils fails to handle clang binaries when we want to > > > find a DIE for a certain address. I.e. dwfl_module_addrdie returns > > > nullptr, and eu- addr2line fails to resolve inlined frames. > > > > > > To reproduce this: > > >[...] > > > > > > This also affects us in our perfparser. Not being able to find a cudie > > > means not finding inlined frames nor file/line mappings, which is quite a > > > set-back. > > > > > > I have noticed that backward-cpp contains a (partially) work-around for > > > this: > > > > > > https://github.com/bombela/backward-cpp/blob/master/backward.hpp#L1216 > > > > O urgh how utterly broken (not backward-cpp, but the bogus DWARF clang > > generates). As that comment says: > > > > // Sadly clang does not generate the section .debug_aranges, > > thus > > // dwfl_module_addrdie will fail early. Clang doesn't either set > > // the lowpc/highpc/range info for every compilation unit. > > // > > // So in order to save the world: > > // for every compilation unit, we will iterate over every single > > // DIEs. Normally functions should have a lowpc/highpc/range, which > > // we will use to infer the compilation unit. > > > > // note that this is probably badly inefficient. > > > > And indeed having to scan through every CU to find a matching function > > DIE is badly inefficient :{ > > But this code comment is relatively old. Are we sure it's really still the > case?
If you were able to replicate it then yes. > > > Is this the right approach and also what the non-eu addr2line does? If so, > > > can that be added upstream too, such that dwfl_module_addrdie can be > > > relied on? > > > > > > I've seen it on clang 3.6, 4 and 5. Neither passing -g3 nor > > > -gdwarf-aranges > > > helps. > > > > Thanks for reporting this. I think this might be the same issue seen > > here: https://sourceware.org/bugzilla/show_bug.cgi?id=21247 > > ... or at least it seems related. The function/address not found in that > > case also comes from a CU generated by clang. It does have a lowpc and > > ranges, but the lowpc looks bogus (zero) and the ranges don't seem to > > cover the function in question. So it seems even worse than your example > > where there are no lowpc/ranges. We cannot even trust them if they are > > there. Sigh. > > So the situation is different from the comment in backward-cpp... Only in how the lowpc/ranges were broken. The core issue is that we cannot rely on the lowpc/ranges (and aranges) being correct for a CU. We assume the DWARF producer doesn't really feed us garbage, but apparently clang does :{ > > I have to think about how to handle this. We clearly need something that > > just ignores the lowpc/highpc/ranges on CUs and parses every CU till the > > function/address DIE is found to know which CU and line_table to use. > > But that is so inefficient that I don't want to do that by default. > > So, if this is really that bad - what are the binutils doing - does anyone > know? They scan every CU just in case. Which is terrible for performance. Just compare binutils addr2line vs elfutils eu-addr2line on a large binary. e.g. on my local machine (best of 3): $ time eu-addr2line -e /usr/lib64/firefox/libxul.so 0x0157a892 /usr/src/debug/firefox-52.1.0/firefox-52.1.0esr/objdir/dom/bindings/ScrollViewChangeEventBinding.cpp:541 real 0m0.067s user 0m0.050s sys 0m0.017s $ time addr2line -e /usr/lib64/firefox/libxul.so 0x0157a892 /usr/src/debug/firefox-52.1.0/firefox-52.1.0esr/objdir/dom/bindings/ScrollViewChangeEventBinding.cpp:541 real 0m25.984s user 0m20.847s sys 0m4.193s So we definitely don't want to do what binutils does by default. Note that the worst case is an address that doesn't match against any function (e.g. what you might get if an unwind goes wrong). Currently that is the cheapest case (not covered by any CU, so done). But if we cannot rely on which addresses are covered by which CU then we have to scan all of them just to make sure there really isn't a subroutine description in there that does cover the address. I want to prevent us having to do that "just in case" and only if we (or the user) knows the DWARF might come from a bad producer. So I am pondering whether we should add something like -b, --bad, as command line argument for things like eu-addr2line, eu-stack, to indicate that we need some workarounds for bad DWARF. Which then would call something like dwarf_force_aranges () or something which would setup an aranges table created by explicit scanning of all CUs. > Also, if it's really against all your expectations, shouldn't we report > this upstream at clang and ask for input there? I can't believe they > knowingly > break their generated code in such a way. Rather, I believe it's either done > unknowingly, or there is some alternative way to interpret the data that we > are not aware of? I think they are aware the DWARF they produce is broken. A quick search finds lots of bug reports about it. The following two specifically seem relevant for the above case: https://bugs.llvm.org/show_bug.cgi?id=13351 https://bugs.llvm.org/show_bug.cgi?id=30569 Cheers, Mark