I have been working on elfutils DWARFv5 support based on what GCC7 implements and the current public draft. I haven't had time to implement everything. But since the public comment period was kept short I thought it would be nice to at least document some of the things that I stumbled over. Hopefully they help to understand why I made some of the choices when reviewing the patches on the elfutils mjw/DWARF5 branch.
Some of these things seem simple to fix/clarify and I will also submit Issues for them to dwarfstd.org so they don't get forgotten. Others are just observations that I struggled with. Which might just need coordination between producers and consumers. Hope the notes are also useful for writing other DWARFv5 consumers. BTW. It would be handy if there were sources for the spec so one can create patches for simple typos. Also it is somewhat opaque how Issues are handled. Could they and any comments from the committee be sent to the mailinglist to make tracking changes to the draft easier. New Language Encodings. The list still has DW_LANG_C_plus_plus_03 which was discussed some time ago on the dwarf-discuss list. Since c++03 doesn't introduce any language changes it isn't clear what this signifies. The original submitter also agreed it wasn't necessary. On the other hand there is no DW_LANG_C_plus_plus_17, which seems appropriate and necessary given that final DWARFv5 will probably come out in 2017 as does GCC7 with C++17 language support. Handling language specific DIE/Attribute properties in partial units. This seems hard to handle in the abstract. I wonder if there is some guidance for producers on what kind of things can be moved into a partial unit to make things easier for consumers. For example the size of variable or data structure. e.g. subrange types might omit the DW_AT_lower_bound attribute in which case the CU language context determines the lower bound. (0 for C, 1 for Fortran, etc.). What if e.g. DW_TAG_subrange_type is placed in a partial unit? Then it seems good if the partial unit DIE has a language attribute. In general it would be good if the partial unit DIE had a language attribute so the context is clear for a consumer/library that might need provide language specific properties for a DIE (given it might be unclear or hard to get at all the (indirect) imports of the partial unit (and what if some of them have different language attributes?). Alternatively a producer placing a DIE into a partial unit might have to add any any attributes, like the lower bound, that might be implicit if the unit DIE had a language attribute. New FORMs. DW_FORM_ref_sup doesn't describe how the offset is represented. Currently the assumption in elfutils is that it is 4 or 8 bytes depending on whether the containing unit is 32bit or 64bit DWARF. This would be consistent with DW_FORM_strp_sup. The consequence is that if the supplemental file has really big data sections you need a 64bit DWARF unit to reference everything in it. There is no description of the representation of DW_FORM_line_strp, but DW_FORM_strp is mentioned twice. I assumed the second should just be DW_FORM_line_strp. Classifying DW_FORM_data16 as a constant value is slightly confusing. Having to handle a 128bit value everywhere a constant value class is allowed is somewhat inconvenient. And such values really only make sense given a specific data representation/type. As given it isn't immediately clear in which context one might have to byte-swap for different endianess (the fact that it also used to represent the MD5 in the line table confused me a bit, wrongly assuming it meant that I might need to byte-swap because it was a constant value representation - it shouldn't of course, it really is a hash represented by a block of bytes). For these reasons in elfutils we currently handle it as (constant size) block class (which is what I hear is also what gdb does). In practice it seems to only impact DW_AT_const_value for which consumers already had to handle blocks. Using it for other attributes doesn't really seem to make sense. Suggest to rename to DW_FORM_data16_block and put it in the block class instead of the constant class. The new DW_FORM_implict_const did eventually work out well, but there were surprisingly many places that assumed abbrevs were simple and didn't use much/any abstraction. The existing DW_FORM_indirect doesn't really seem handled very well, which would break most of these place too. Unit headers. Having extra padding fields for all unit types seems a bit wasteful. Also there is not enough information for a consumer to know whether it can handle anything from a unit which unit type is unknown. Which, if any, fields following the unit_type is valid? Or is just the initial unit_length valid and is the only valid operation skipping the whole unit? Having a place to store a unique identifier and a reference to a primary/sub DIE inside the unit is nice and could be made more generic by turning the unit_type field into a bit/flag field. One flag to indicate it is a type unit, one for partial unit, one for skeleton unit and for split unit. Some combinations don't make sense currently, but might in the future. Or keep the current DW_UT values (1..6) as they are now. But limit the extensions to 15. Then use the remaining 4 bits as flags to indicate whether a unit header contains extra fields. You can define 2 already. One if the header contains an 8 byte ID field. And one if the header contains an DIE offset field (4 or 8 bytes). That basically gives you 16 values for describing the type and 16 for describing the header fields. But you could shift them a bit if you think it is more important to have flexibility in unit types or describing header fields/size. Enumeration types. It is allowed to have a DW_AT_byte_size on a DW_TAG_enumeration_type, but not DW_AT_encoding. To describe both size and encoding one needs to use a DW_AT_type pointing to a base type that represents the "underlying type". For languages where enumerations don't have an underlying type, or for strongly typed enums it is easier to attach the encoding directly than adding and indirection to a base type. Add DW_AT_encoding to the attribute list for DW_TAG_enumeration_type. Macro Information Header. The macro information entries in the opcode_operands_table may be described in the table. But some cannot be described because some forms are not in the list of allowed forms. In particular DW_FORM_strp_sup is missing so DW_MACRO_define_sup and DW_MACRO_undef_sup cannot be described. And DW_FORM_ref_sup is missing, making it impossible to describe DW_MACRO_import_sup. Which makes the code that checks for allowed forms slightly inconvenient (it should reject these MACRO descriptions if those forms are used in the table, but not if they are defined implicitly). Also DW_FORM_line_strp isn't allowed. But it might be beneficial for describing files referenced by macros. _______________________________________________ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org