On Thu, Feb 24, 2022 at 2:24 PM Samy Al Bahra <sba...@repnop.org> wrote:
> Hi David > > I implemented some optimizations in the form of a specialized parser for > fast AT_ranges scanning and performance is now comparable to lazy > evaluation through .debug_aranges (only marginally worse assuming buffer > cache warmed up). We've since shipped with these optimizations. I have to > do some work in the same code base in March and will run a comparison then > / share numbers here including after dropping buffers. If you would benefit > from having them sooner, let me know and I'll make it happen. > No rush - just came across the thread and was curious if there were any updates/closure/lessons to factor in, etc. I'm glad to hear you ended up with fairly similar performance - that matches my expectation, that there aren't some hidden scalability issues here. But yeah, curious to hear more if/when you happen to have more to share. - Dave > > On Thu, Feb 24, 2022 at 3:44 PM David Blaikie <dblai...@gmail.com> wrote: > >> Hey Samy - curious if you ever happened to end up getting further details >> here. >> >> On Fri, Apr 9, 2021 at 1:05 PM Samy Al Bahra <sba...@repnop.org> wrote: >> >>> Thanks for the detailed response David. >>> >>> On Fri, Apr 9, 2021 at 2:52 PM David Blaikie <dblai...@gmail.com> wrote: >>> >>>> I'm not suggesting scanning all of .debug_info - only the CU DIE for >>>> DW_AT_ranges or high/low_pc, then skip to the next CU DIE (via the >>>> unit header's next unit offset). >>>> >>> >>>> It sounded like CU ranges couldn't be used to build such an index at >>>> all/that your code used quite a different strategy in the absence of >>>> aranges? (rather than building the index from the CU ranges - somewhat >>>> slower I'm sure, but I wouldn't've thought (& am trying to understand >>>> if it is/why) so fundamentally slower that it wouldn't be the next >>>> fallback rather than skipping the index entirely or employing some >>>> more fundamentally different approach) >>> >>> >>> This is still significantly less dense than aranges, involves more disk >>> I/O and memory pressure. Let me see what optimizations I can implement here >>> and get back to you with the results / what I came up with. This would be a >>> better basis for apples to apples comparison. >>> >>> >>>> >>>> If you mean building ranges from all the DIEs deep inside a CU - yeah, >>>> that's going to be fundamentally slower in a bunch of ways that maybe >>>> I could see that would necessitate a totally different approach/that >>>> the index wouldn't make sense anymore (though I'd still like to >>>> understand it) - but I'm especially curious about the case where the >>>> CU DIE itself does have comprehensive address range information. >>>> >>> >>> Will report back on this. >>> >>> >>>> >>>> - Dave >>>> >>>> > >>>> >> >>>> >> >>>> >>> >>>> >>> (+ complexities Greg mentions later in the thread). In cases where >>>> we lack this, we use our own persistent cache which introduces unnecessary >>>> complexity. Now I am considering going as far as adding a multi-threaded >>>> indexer for cases where a persistent cache / build system modifications >>>> aren't an option (work to begin in the next week or two). >>>> >>> >>>> >>> .debug_aranges would provide a lot of value to our users. >>>> >>> >>>> >>> On Thu, Mar 11, 2021 at 3:48 PM David Blaikie via Dwarf-Discuss < >>>> dwarf-discuss@lists.dwarfstd.org> wrote: >>>> >>>> >>>> >>>> On Thu, Mar 11, 2021 at 5:48 AM <paul.robin...@sony.com> wrote: >>>> >>>>> >>>> >>>>> Hopefully not to side-track things too much... maybe wants its own >>>> >>>>> thread, if there's more to debate here. >>>> >>>> >>>> >>>> >>>> >>>> Yeah, how about we spin it off into another thread (done here) >>>> >>>> >>>> >>>>> >>>> >>>>> >> For the case you suggested where it would be useful to keep >>>> the range >>>> >>>>> >> list for the CU in the .o file, I think .debug_aranges is what >>>> you're >>>> >>>>> >> looking for. >>>> >>>>> > >>>> >>>>> > aranges has been off by default in LLVM for a while - it adds a >>>> lot of >>>> >>>>> > overhead (doesn't have all the nice rnglist encodings for >>>> instance - >>>> >>>>> > nor can it use debug_addr, and if it did it'd still be >>>> duplicate with >>>> >>>>> > the CU ranges wherever they were). >>>> >>>>> >>>> >>>>> Did you want to file an issue to improve how .debug_aranges works? >>>> >>>> >>>> >>>> >>>> >>>> I don't currently understand the value it provides, and I at least >>>> don't have a use case for it, so I'm not sure I'd be the best person to >>>> advocate/drive that work. >>>> >>>> >>>> >>>>> Complaining that it duplicates CU ranges is missing the point, >>>> though; >>>> >>>>> it's an index, like .debug_names, of course it duplicates other >>>> info. >>>> >>>>> If you want to suggest an improved index, like we did with >>>> .debug_names, >>>> >>>>> that would be great too. >>>> >>>> >>>> >>>> >>>> >>>> .debug_names is quite different though - it collects information >>>> from across the DIE tree - information that is expensive to otherwise >>>> gather (walking the whole DIE tree). >>>> >>>> >>>> >>>> .debug_aranges is not like that for most producers (producers that >>>> do include the address ranges on the CU DIE) - the data is readily >>>> available immediately on the CU. That does involve reading some of >>>> .debug_abbrev, and interpreting a handful of attributes - but at least for >>>> the use cases I'm aware of, that overhead isn't worth the size increase. >>>> >>>> >>>> >>>> Do you have numbers on the benefits of .debug_aranges compared to >>>> parsing the ranges from CU DIEs? >>>> >>>> >>>> >>>> (one possible issue: the CU doesn't /have/ to contain >>>> low/high/ranges if its children DIEs contain addresses - having that as a >>>> guarantee, or some preferred way of encoding zero length (high/low of 0 >>>> would be acceptable, I guess) would be nice & make it cheap to skip over >>>> CUs that don't have any address ranges) >>>> >>>> >>>> >>>> Roughly, a modern debug_aranges to me would look something like: >>>> >>>> >>>> >>>> <length> >>>> >>>> <version> >>>> >>>> <CU sec_offset> >>>> >>>> <addr_base> >>>> >>>> <rnglist sec_offset> >>>> >>>> >>>> >>>> So it could fully re-use the rnglist encoding. If this was going >>>> to be as compact as possible, it'd need to be configurable which encodings >>>> it uses - ranges V high/low, addrx V addr - at which point it'd probably >>>> look like a small DIE with an inline abbrev (similar to the way DWARFv5 >>>> encodes the file and directory entries now, and how debug_names is >>>> self-describing) - at which point it looks to me a lot like parsing the CU >>>> DIEs. >>>> >>>> >>>> >>>> _______________________________________________ >>>> >>>> Dwarf-Discuss mailing list >>>> >>>> Dwarf-Discuss@lists.dwarfstd.org >>>> >>>> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org >>>> >>> >>>> >>> >>>> >>> >>>> >>> -- >>>> >>> Samy Al Bahra [http://repnop.org] >>>> > >>>> > >>>> > >>>> > -- >>>> > Samy Al Bahra [http://repnop.org] >>>> >>> >>> >>> -- >>> Samy Al Bahra [http://repnop.org] >>> >> > > -- > Samy Al Bahra [http://repnop.org] >
_______________________________________________ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org