Hi Mark, On Tue, Feb 13, 2024 at 8:28 AM Mark Wielaard <m...@klomp.org> wrote: > > > This patch's method of building the aranges list is slower than simply > > reading .debug_aranges. On my machine, running eu-stack on a 2.9G > > firefox core file takes about 8.7 seconds with this patch applied, > > compared to about 3.3 seconds without this patch. > > That is significant. 2.5 times slower. > Did you check with perf or some other profiler where exactly the extra > time goes. Does the new method find more aranges (and so produces > "better" backtraces)?
I took another look at the performance and realized I made a silly mistake when I originally tested this. My build that was 2.5x slower was compiled with -O0 but I tested it against an -O2 build. Oops! With the optimization level set to -O2 in all cases, the runtime of 'eu-stack -s' on the original 2.9G firefox core file is only about 9% slower: 3.6 seconds with the patch applied compared to 3.3 seconds without the patch. As for the number of aranges found, there is a difference for libxul.so: 250435 with the patch compared to 254832 without. So 4397 fewer aranges are found when using the new CU iteration method. I'll dig into this and see if there is a problem or if it's just due to some redundancy in libxul's .debug_aranges. FWIW there was no change to the aranges counts for the other modules searched during this eu-stack firefox corefile test. > > > Ideally we could assume that .debug_aranges is complete if it is present > > and build the aranges list via CU iteration only when .debug_aranges > > is absent. This would let us save time on gcc-compiled binaries, which > > include complete .debug_aranges by default. > > Right. This why the question is if the firefox case sees more/less > aranges. If I remember correctly it is build with gcc and rustc, and > rustc might not produce .debug_aranges. > > > However the DWARF spec appears to permit partially complete > > .debug_aranges [1]. We could improve performance by starting with a > > potentially incomplete list built from .debug_aranges. If a lookup > > fails then search the CUs for missing aranges and add to the list > > when found. > > > > This approach would complicate the dwarf_get_aranges interface. The > > list it initially provides could no longer be assumed to be complete. > > The number of elements in the list could change during calls to > > dwarf_getarange{info, _addr}. This would invalidate the naranges value > > set by dwarf_getaranges. The current API doesn't include a way to > > communicate to the caller when narages changes and by how much. > > > > Due to these complications I think it's better to simply ignore > > .debug_aranges altogether and build the aranges table via CU iteration, > > as is done in this patch. > > Might it be an idea to leave dwarf_getaranges as it is and introduce a > new (internal) function to get "dynamic" ranges? It looks like what > programs (like eu-stack and eu-addr2line) really use is dwarf_addrdie > and dwfl_module_addrdie. These are currently build on dwarf_getaranges, > but could maybe use a new interface? IMO this depends on what users expect from dwarf_getaranges. Do they want the exact contents of .debug_aranges (whether or not it's complete) or should dwarf_getaranges go beyond .debug_aranges to ensure the most complete results? The comment for dwarf_getaranges in libdw.h simply reads "Return list address ranges". Since there's no mention of .debug_aranges specifically, I think it's fair if dwarf_getaranges does whatever it can to ensure comprehensive results. In which case dwarf_getaranges should probably dynamically generate aranges. Aaron