On Thu, Sep 30, 2021 at 12:27 AM Mark Wielaard <m...@klomp.org> wrote: > > Hi KJ, > > On Sat, 2021-09-25 at 17:21 +1000, KJ Tsanaktsidis via Elfutils-devel > wrote: > > I'm writing a program that uses ptrace to poke at internal OpenSSL > > data structures for another process. I'm using libdw to parse the > > DWARF data for the copy of OpenSSL actually linked in to the target > > process, so I can extract struct offsets, member sizes and the like > > and poke at the right places. > > > > I've run into an issue where dwarf_aggregate_size can't calculate the > > size of an array, when the array is included in a partial CU > > (DW_TAG_partial_unit). If the array unit includes a DW_AT_upper_bound > > attribute, but not a DW_AT_lower_bound attribute, then > > dwarf_aggregate_size will infer the lower bound based on the > > DW_AT_language attribute of the enclisng CU (i.e. whether the language > > uses zero or one based indexing). > > > > However, the debug symbols I'm looking at for OpenSSL from the Ubuntu > > repositories have the DW_AT_language on the full compilation unit > > entries, but not in the partial ones included in them. This means that > > caling dwarf_aggregate_size on the array type DIE does not work. > > That is indeed a problem, since dwarf_aggregate_size doesn't provide > another way to provide the language to use for the > dwarf_default_lower_bound call. And the default is to return an > DWARF_E_UNKNOWN_LANGUAGE error. > > Maybe we should change the default to assume the lower bound is zero? > > > The DWARF spec doesn't really seem to have anything to say on the > > matter (all it says is "A full or partial compilation unit entry may > > have the following attributes", but doesn't say what it logically > > means if an attribute is present on the complete CU but not a partial > > one). > > I think it is assumed that it inherits those attributes from the CU > from which the partial one was imported and/or from the CU of the DIE > that referenced the DIE in the partial unit. But I don't think it is > easy to track that with libdw currently. > > > I guess it doesn't really make sense for a single compilation unit to > > contain multiple languages? So I wonder if dwarf_srclang (called by > > dwarf_aggregate_size) should crawl through the list of CU's to see if > > the DIE's CU is included in a CU that _does_ specify DW_AT_language > > (recursively, I suppose). Then, we can infer that the partial CU's > > language is the same as the enclosing one. > > > > If people reckon this is a good idea (or, have a better one!), I'm > > happy to try and put together a patch. > > I think that suggestion is sound, but really expensive. It also is > somewhat tricky if you have alt files, you'll have to track back to the > original Dwarf to see if it imports one of the partial units from the > alt file. > > But I also don't have a good alternative idea. We could maybe have a > variant of dwarf_aggregate_size that takes a language default value, > but that doesn't seem like a very generic solution. Or maybe a variant > of dwarf_srclang that takes any DIE (not just a CU DIE) and which tries > to figure out the best language to use, which falls back to some > default value if it cannot figure out what the language is that can be > used with dwarf_default_lower_bound to get a default (most likely > zero)? > > We could also ask producers (like dwz) to always include a > DW_AT_language for partial units they create. But that of course makes > the partial units bigger (and at least dwz creates them to make the > full debuginfo smaller). > > Cheers, > > Mark >
I guess we don't want to hide some really expensive traversal operation inside a simple call to dwarf_aggregate_size, no... What if we instead provide a way for the user to specify what language a CU is? Like "dwarf_cu_report_language(Dwarf_Die *cu, int lang)". That would get saved with the (partial) CU, and dwarf_srclang could retrieve this information (if DW_AT_language isn't set). Then, the user could recursively traverse all CUs and call dwarf_cu_report_language on each partial CU. And as a bonus, we could even wrap that up in dwarf_cu_traverse_partial_cu_set_language or something (OK, the name needs a bit of workshopping). That way, the expensive thing is in a separate call that's marked as being very expensive (and cached, so it only needs to be done once). Sound like a reasonable approach?