Hi David I implemented some optimizations in the form of a specialized parser for fast AT_ranges scanning and performance is now comparable to lazy evaluation through .debug_aranges (only marginally worse assuming buffer cache warmed up). We've since shipped with these optimizations. I have to do some work in the same code base in March and will run a comparison then / share numbers here including after dropping buffers. If you would benefit from having them sooner, let me know and I'll make it happen.
On Thu, Feb 24, 2022 at 3:44 PM David Blaikie <dblai...@gmail.com> wrote: > Hey Samy - curious if you ever happened to end up getting further details > here. > > On Fri, Apr 9, 2021 at 1:05 PM Samy Al Bahra <sba...@repnop.org> wrote: > >> Thanks for the detailed response David. >> >> On Fri, Apr 9, 2021 at 2:52 PM David Blaikie <dblai...@gmail.com> wrote: >> >>> I'm not suggesting scanning all of .debug_info - only the CU DIE for >>> DW_AT_ranges or high/low_pc, then skip to the next CU DIE (via the >>> unit header's next unit offset). >>> >> >>> It sounded like CU ranges couldn't be used to build such an index at >>> all/that your code used quite a different strategy in the absence of >>> aranges? (rather than building the index from the CU ranges - somewhat >>> slower I'm sure, but I wouldn't've thought (& am trying to understand >>> if it is/why) so fundamentally slower that it wouldn't be the next >>> fallback rather than skipping the index entirely or employing some >>> more fundamentally different approach) >> >> >> This is still significantly less dense than aranges, involves more disk >> I/O and memory pressure. Let me see what optimizations I can implement here >> and get back to you with the results / what I came up with. This would be a >> better basis for apples to apples comparison. >> >> >>> >>> If you mean building ranges from all the DIEs deep inside a CU - yeah, >>> that's going to be fundamentally slower in a bunch of ways that maybe >>> I could see that would necessitate a totally different approach/that >>> the index wouldn't make sense anymore (though I'd still like to >>> understand it) - but I'm especially curious about the case where the >>> CU DIE itself does have comprehensive address range information. >>> >> >> Will report back on this. >> >> >>> >>> - Dave >>> >>> > >>> >> >>> >> >>> >>> >>> >>> (+ complexities Greg mentions later in the thread). In cases where >>> we lack this, we use our own persistent cache which introduces unnecessary >>> complexity. Now I am considering going as far as adding a multi-threaded >>> indexer for cases where a persistent cache / build system modifications >>> aren't an option (work to begin in the next week or two). >>> >>> >>> >>> .debug_aranges would provide a lot of value to our users. >>> >>> >>> >>> On Thu, Mar 11, 2021 at 3:48 PM David Blaikie via Dwarf-Discuss < >>> dwarf-discuss@lists.dwarfstd.org> wrote: >>> >>>> >>> >>>> On Thu, Mar 11, 2021 at 5:48 AM <paul.robin...@sony.com> wrote: >>> >>>>> >>> >>>>> Hopefully not to side-track things too much... maybe wants its own >>> >>>>> thread, if there's more to debate here. >>> >>>> >>> >>>> >>> >>>> Yeah, how about we spin it off into another thread (done here) >>> >>>> >>> >>>>> >>> >>>>> >> For the case you suggested where it would be useful to keep the >>> range >>> >>>>> >> list for the CU in the .o file, I think .debug_aranges is what >>> you're >>> >>>>> >> looking for. >>> >>>>> > >>> >>>>> > aranges has been off by default in LLVM for a while - it adds a >>> lot of >>> >>>>> > overhead (doesn't have all the nice rnglist encodings for >>> instance - >>> >>>>> > nor can it use debug_addr, and if it did it'd still be duplicate >>> with >>> >>>>> > the CU ranges wherever they were). >>> >>>>> >>> >>>>> Did you want to file an issue to improve how .debug_aranges works? >>> >>>> >>> >>>> >>> >>>> I don't currently understand the value it provides, and I at least >>> don't have a use case for it, so I'm not sure I'd be the best person to >>> advocate/drive that work. >>> >>>> >>> >>>>> Complaining that it duplicates CU ranges is missing the point, >>> though; >>> >>>>> it's an index, like .debug_names, of course it duplicates other >>> info. >>> >>>>> If you want to suggest an improved index, like we did with >>> .debug_names, >>> >>>>> that would be great too. >>> >>>> >>> >>>> >>> >>>> .debug_names is quite different though - it collects information >>> from across the DIE tree - information that is expensive to otherwise >>> gather (walking the whole DIE tree). >>> >>>> >>> >>>> .debug_aranges is not like that for most producers (producers that >>> do include the address ranges on the CU DIE) - the data is readily >>> available immediately on the CU. That does involve reading some of >>> .debug_abbrev, and interpreting a handful of attributes - but at least for >>> the use cases I'm aware of, that overhead isn't worth the size increase. >>> >>>> >>> >>>> Do you have numbers on the benefits of .debug_aranges compared to >>> parsing the ranges from CU DIEs? >>> >>>> >>> >>>> (one possible issue: the CU doesn't /have/ to contain >>> low/high/ranges if its children DIEs contain addresses - having that as a >>> guarantee, or some preferred way of encoding zero length (high/low of 0 >>> would be acceptable, I guess) would be nice & make it cheap to skip over >>> CUs that don't have any address ranges) >>> >>>> >>> >>>> Roughly, a modern debug_aranges to me would look something like: >>> >>>> >>> >>>> <length> >>> >>>> <version> >>> >>>> <CU sec_offset> >>> >>>> <addr_base> >>> >>>> <rnglist sec_offset> >>> >>>> >>> >>>> So it could fully re-use the rnglist encoding. If this was going to >>> be as compact as possible, it'd need to be configurable which encodings it >>> uses - ranges V high/low, addrx V addr - at which point it'd probably look >>> like a small DIE with an inline abbrev (similar to the way DWARFv5 encodes >>> the file and directory entries now, and how debug_names is self-describing) >>> - at which point it looks to me a lot like parsing the CU DIEs. >>> >>>> >>> >>>> _______________________________________________ >>> >>>> Dwarf-Discuss mailing list >>> >>>> Dwarf-Discuss@lists.dwarfstd.org >>> >>>> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Samy Al Bahra [http://repnop.org] >>> > >>> > >>> > >>> > -- >>> > Samy Al Bahra [http://repnop.org] >>> >> >> >> -- >> Samy Al Bahra [http://repnop.org] >> > -- Samy Al Bahra [http://repnop.org]
_______________________________________________ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org