[Dwarf-Discuss] DWARF and source text embedding
Hello all, I am a compiler engineer at AMD, working on tools for debugging online-compiled programs. The problem I am attempting to solve was brought up previously in the DWARF Standard issue 161018.1 titled "DWARF-embedded source for online-compiled programs", and is the result of runtimes like OpenCL doing online compilation in an environment where it is not desireable (or even feasible) to write sources to disk. In these cases, it would be useful to support embedding the source directly in the resulting DWARF. I would like to propose a similar solution to the one outlined in the above issue, but without structural changes to the specification. Add two new optional fields to the file_names prologue of the line table. Section 6.2.4.1: Add two bullets after "5. DW_LNCT_MD5" 6. DW_LNCT_has_source DW_LNCT_has_source indicates that the value is a boolean which affects the interpretation of an accompanying DW_LNCT_source value. When present there must be an accompanying DW_LNCT_source value. When true, consumers may use the embedded source instead of attempting to discover the source on disk. When false, consumers will ignore the DW_LNCT_source value. This code point is always paired with a flag form (e.g. DW_FORM_flag or DW_FORM_flag_present). 7. DW_LNCT_source DW_LNCT_source indicates that the value is a null-terminated string which is the original source text of the file. When present there must be an accompanying DW_LNCT_has_source value. The string will contain the UTF-8 encoded source text with '\n' line endings. When the accompanying DW_LNCT_has_source value is false, the value of DW_LNCT_source will be the empty string. This code point is always paired with a string form (e.g. DW_FORM_string, DW_FORM_line_strp, DW_FORM_strp). New type codes can be allocated for them in a backwards-compatible way, or codes for these new content types can be added in the range of [DW_LNCT_lo_user, DW_LNCT_hi_user] to avoid changing the spec itself. Table 7.27: Add DW_LNCT_has_source 0x6 Add DW_LNCT_source 0x7 Any DWARFv5 consumer which is unaware of this extension would continue to operate as before, ignoring the new fields. Any consumer which is aware of the extension would know to check DW_LNCT_has_source for each file_name entry in order to determine whether the embedded source field (DW_LNCT_source) contains the source text of the corresponding file. My team and I believe this simplifies the design by removing the need for changes to the compile unit sections, and by avoiding the addition of multiple file_name_entry_formats in a single program, all without sacrificing any information. We have a preliminary implementation in LLVM/Clang, which supports embedding source (clang -gdwarf-5 -gembed-source) and inspecting it via llvm-dwarfdump and llvm-objdump (with the -source flag). The patches are available at https://reviews.llvm.org/D42765 (LLVM) and https://reviews.llvm.org/D42766 (Clang). I would like any and all feedback on the design, and want to see about the possibility of adding the new content type codes outside of the "user" range (i.e. adding new entries for them in Table 7.27) in the next version of the specification. Regards, Scott Linder ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
Re: [Dwarf-Discuss] DWARF and source text embedding
Hi John, In the case where the files are actually available on disk, and the source is simply being "cached", the attributes are exactly the same. In the case where sources are generated, and so have no true path on disk, I would suggest we might just leave the exact meaning to be implementation defined; the producer can still provide valuable information which will aid in locating where sources originate, such as indicating the OpenCL kernel name. Consumers which are unaware of this extension will simply fail to find the source (as before), while new consumers can at least provide an identifier to distinguish sources. The remaining attributes (DW_AT_language, DW_AT_producer, etc.) seem pretty naturally orthogonal. Regards, Scott On 2018-01-31 14:40, John DelSignore wrote: Hi Scott, Question: What does the DW_TAG_compile_unit look like for an embedded source file? For example, what does the DW_AT_name and DW_AT_comp_dir look like? Cheers, John D. On 01/31/18 17:05, sc...@scottlinder.com wrote: Hello all, I am a compiler engineer at AMD, working on tools for debugging online-compiled programs. The problem I am attempting to solve was brought up previously in the DWARF Standard issue 161018.1 titled "DWARF-embedded source for online-compiled programs", and is the result of runtimes like OpenCL doing online compilation in an environment where it is not desireable (or even feasible) to write sources to disk. In these cases, it would be useful to support embedding the source directly in the resulting DWARF. I would like to propose a similar solution to the one outlined in the above issue, but without structural changes to the specification. Add two new optional fields to the file_names prologue of the line table. Section 6.2.4.1: Add two bullets after "5. DW_LNCT_MD5" 6. DW_LNCT_has_source DW_LNCT_has_source indicates that the value is a boolean which affects the interpretation of an accompanying DW_LNCT_source value. When present there must be an accompanying DW_LNCT_source value. When true, consumers may use the embedded source instead of attempting to discover the source on disk. When false, consumers will ignore the DW_LNCT_source value. This code point is always paired with a flag form (e.g. DW_FORM_flag or DW_FORM_flag_present). 7. DW_LNCT_source DW_LNCT_source indicates that the value is a null-terminated string which is the original source text of the file. When present there must be an accompanying DW_LNCT_has_source value. The string will contain the UTF-8 encoded source text with '\n' line endings. When the accompanying DW_LNCT_has_source value is false, the value of DW_LNCT_source will be the empty string. This code point is always paired with a string form (e.g. DW_FORM_string, DW_FORM_line_strp, DW_FORM_strp). New type codes can be allocated for them in a backwards-compatible way, or codes for these new content types can be added in the range of [DW_LNCT_lo_user, DW_LNCT_hi_user] to avoid changing the spec itself. Table 7.27: Add DW_LNCT_has_source 0x6 Add DW_LNCT_source 0x7 Any DWARFv5 consumer which is unaware of this extension would continue to operate as before, ignoring the new fields. Any consumer which is aware of the extension would know to check DW_LNCT_has_source for each file_name entry in order to determine whether the embedded source field (DW_LNCT_source) contains the source text of the corresponding file. My team and I believe this simplifies the design by removing the need for changes to the compile unit sections, and by avoiding the addition of multiple file_name_entry_formats in a single program, all without sacrificing any information. We have a preliminary implementation in LLVM/Clang, which supports embedding source (clang -gdwarf-5 -gembed-source) and inspecting it via llvm-dwarfdump and llvm-objdump (with the -source flag). The patches are available at https://reviews.llvm.org/D42765 (LLVM) and https://reviews.llvm.org/D42766 (Clang). I would like any and all feedback on the design, and want to see about the possibility of adding the new content type codes outside of the "user" range (i.e. adding new entries for them in Table 7.27) in the next version of the specification. Regards, Scott Linder ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
Re: [Dwarf-Discuss] DWARF and source text embedding
Hi Paul, My intention was to support an empty source string; I want to be explicit about the presence of embedded source for each file. When reading the spec I did notice places where an empty string can indicate the absence of the attribute (e.g. DW_AT_name), but I would prefer to be explicit here. Scott On 2018-02-01 11:19, paul.robin...@sony.com wrote: -Original Message- From: Dwarf-Discuss [mailto:dwarf-discuss-boun...@lists.dwarfstd.org] On Behalf Of sc...@scottlinder.com Sent: Wednesday, January 31, 2018 2:05 PM To: dwarf-discuss@lists.dwarfstd.org Subject: [Dwarf-Discuss] DWARF and source text embedding Hello all, I am a compiler engineer at AMD, working on tools for debugging online-compiled programs. The problem I am attempting to solve was brought up previously in the DWARF Standard issue 161018.1 titled "DWARF-embedded source for online-compiled programs", and is the result of runtimes like OpenCL doing online compilation in an environment where it is not desireable (or even feasible) to write sources to disk. In these cases, it would be useful to support embedding the source directly in the resulting DWARF. I would like to propose a similar solution to the one outlined in the above issue, but without structural changes to the specification. Add two new optional fields to the file_names prologue of the line table. Section 6.2.4.1: Add two bullets after "5. DW_LNCT_MD5" 6. DW_LNCT_has_source DW_LNCT_has_source indicates that the value is a boolean which affects the interpretation of an accompanying DW_LNCT_source value. When present there must be an accompanying DW_LNCT_source value. When true, consumers may use the embedded source instead of attempting to discover the source on disk. When false, consumers will ignore the DW_LNCT_source value. This code point is always paired with a flag form (e.g. DW_FORM_flag or DW_FORM_flag_present). 7. DW_LNCT_source DW_LNCT_source indicates that the value is a null-terminated string which is the original source text of the file. When present there must be an accompanying DW_LNCT_has_source value. The string will contain the UTF-8 encoded source text with '\n' line endings. When the accompanying DW_LNCT_has_source value is false, the value of DW_LNCT_source will be the empty string. This code point is always paired with a string form (e.g. DW_FORM_string, DW_FORM_line_strp, DW_FORM_strp). Would a zero-length string indicate something other than "has_source=false"? If not, then a separate has_source flag seems redundant. --paulr New type codes can be allocated for them in a backwards-compatible way, or codes for these new content types can be added in the range of [DW_LNCT_lo_user, DW_LNCT_hi_user] to avoid changing the spec itself. Table 7.27: Add DW_LNCT_has_source 0x6 Add DW_LNCT_source 0x7 Any DWARFv5 consumer which is unaware of this extension would continue to operate as before, ignoring the new fields. Any consumer which is aware of the extension would know to check DW_LNCT_has_source for each file_name entry in order to determine whether the embedded source field (DW_LNCT_source) contains the source text of the corresponding file. My team and I believe this simplifies the design by removing the need for changes to the compile unit sections, and by avoiding the addition of multiple file_name_entry_formats in a single program, all without sacrificing any information. We have a preliminary implementation in LLVM/Clang, which supports embedding source (clang -gdwarf-5 -gembed-source) and inspecting it via llvm-dwarfdump and llvm-objdump (with the -source flag). The patches are available at https://reviews.llvm.org/D42765 (LLVM) and https://reviews.llvm.org/D42766 (Clang). I would like any and all feedback on the design, and want to see about the possibility of adding the new content type codes outside of the "user" range (i.e. adding new entries for them in Table 7.27) in the next version of the specification. Regards, Scott Linder ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
Re: [Dwarf-Discuss] DWARF and source text embedding
Michael, Paul, In the current proposal, it is not an error to have any value (including an empty string) in the _source attribute when the _has_source flag is true, which allows for embedding an empty source string. After seeing more feedback on this point, I think you are right that the extra flag is unnecessary. Looking at similar attributes like MD5 and how they are handled I think it would be best to modify the proposal to remove the flag and require the source be present on all files in the same line table if the attribute is present in the prologue. I still think we should have wording which indicates an empty string is still a valid value for embedded source, and should not be interpreted as indicating the absence of embedded source for that file. This is analogous to the current MD5 attribute, as even 16 null bytes is a valid MD5. What are your thoughts on this approach? Scott On 2018-02-01 17:20, Michael Eager wrote: On 02/01/2018 12:01 PM, sc...@scottlinder.com wrote: Hi Paul, My intention was to support an empty source string; I want to be explicit about the presence of embedded source for each file. I'm not fond of the belt and suspenders approach. If there is one specifier for an attribute, there's no need for a second to say that it's valid. There's always the issue of what it means when the two attributes disagree, such as when you have a flag saying that there is embedded source, but the source string is empty. Is that an error? When reading the spec I did notice places where an empty string can indicate the absence of the attribute (e.g. DW_AT_name), but I would prefer to be explicit here. Scott On 2018-02-01 11:19, paul.robin...@sony.com wrote: -Original Message- From: Dwarf-Discuss [mailto:dwarf-discuss-boun...@lists.dwarfstd.org] On Behalf Of sc...@scottlinder.com Sent: Wednesday, January 31, 2018 2:05 PM To: dwarf-discuss@lists.dwarfstd.org Subject: [Dwarf-Discuss] DWARF and source text embedding Hello all, I am a compiler engineer at AMD, working on tools for debugging online-compiled programs. The problem I am attempting to solve was brought up previously in the DWARF Standard issue 161018.1 titled "DWARF-embedded source for online-compiled programs", and is the result of runtimes like OpenCL doing online compilation in an environment where it is not desireable (or even feasible) to write sources to disk. In these cases, it would be useful to support embedding the source directly in the resulting DWARF. I would like to propose a similar solution to the one outlined in the above issue, but without structural changes to the specification. Add two new optional fields to the file_names prologue of the line table. Section 6.2.4.1: Add two bullets after "5. DW_LNCT_MD5" 6. DW_LNCT_has_source DW_LNCT_has_source indicates that the value is a boolean which affects the interpretation of an accompanying DW_LNCT_source value. When present there must be an accompanying DW_LNCT_source value. When true, consumers may use the embedded source instead of attempting to discover the source on disk. When false, consumers will ignore the DW_LNCT_source value. This code point is always paired with a flag form (e.g. DW_FORM_flag or DW_FORM_flag_present). 7. DW_LNCT_source DW_LNCT_source indicates that the value is a null-terminated string which is the original source text of the file. When present there must be an accompanying DW_LNCT_has_source value. The string will contain the UTF-8 encoded source text with '\n' line endings. When the accompanying DW_LNCT_has_source value is false, the value of DW_LNCT_source will be the empty string. This code point is always paired with a string form (e.g. DW_FORM_string, DW_FORM_line_strp, DW_FORM_strp). Would a zero-length string indicate something other than "has_source=false"? If not, then a separate has_source flag seems redundant. --paulr New type codes can be allocated for them in a backwards-compatible way, or codes for these new content types can be added in the range of [DW_LNCT_lo_user, DW_LNCT_hi_user] to avoid changing the spec itself. Table 7.27: Add DW_LNCT_has_source 0x6 Add DW_LNCT_source 0x7 Any DWARFv5 consumer which is unaware of this extension would continue to operate as before, ignoring the new fields. Any consumer which is aware of the extension would know to check DW_LNCT_has_source for each file_name entry in order to determine whether the embedded source field (DW_LNCT_source) contains the source text of the corresponding file. My team and I believe this simplifies the design by removing the need for changes to the compile unit sections, and by avoiding the addition of multiple file_name_entry_formats in a single program, all without sacrificing any information. We have a preliminary implementa
Re: [Dwarf-Discuss] DWARF and source text embedding
Michael, In the case of this proposal, then, I suggest the CU fields (AT_{name,comp_dir}) retain their exact current definitions. Language implementations, regardless of whether they might want to support embedding source, currently use the filesystem. This extension is essentially just cacheing source which may become unavailable to the consumer by the time the program is debugged. This means the producer can put standard values in each CU field, and also embed source in the line table. If in the future there is a need to add CU fields or modify existing ones to capture some other attribute, that can be done in a different proposal. Scott On 2018-02-01 17:32, Michael Eager wrote: On 02/01/2018 08:07 AM, sc...@scottlinder.com wrote: Hi John, In the case where the files are actually available on disk, and the source is simply being "cached", the attributes are exactly the same. In the case where sources are generated, and so have no true path on disk, I would suggest we might just leave the exact meaning to be implementation defined; the producer can still provide valuable information which will aid in locating where sources originate, such as indicating the OpenCL kernel name. Consumers which are unaware of this extension will simply fail to find the source (as before), while new consumers can at least provide an identifier to distinguish sources. Implementation-defined generally means that different implementations will be incompatible. Incompatible implementations are the antithesis of a standard. As a general DWARF principle, there should be no secret understandings between producer and consumer. There should be no "secret handshake" such as the one you describe where a producer provides "valuable information" in some undefined manner usable only by a consumer which is "in on the secret". It's not that a different consumer doesn't implement the extension, it's that a different consumer cannot implement the extension. Attributes which have a defined meaning, such as AT_name or AT_comp_dir, should have a well defined meaning in all circumstances. The remaining attributes (DW_AT_language, DW_AT_producer, etc.) seem pretty naturally orthogonal. Regards, Scott On 2018-01-31 14:40, John DelSignore wrote: Hi Scott, Question: What does the DW_TAG_compile_unit look like for an embedded source file? For example, what does the DW_AT_name and DW_AT_comp_dir look like? Cheers, John D. On 01/31/18 17:05, sc...@scottlinder.com wrote: Hello all, I am a compiler engineer at AMD, working on tools for debugging online-compiled programs. The problem I am attempting to solve was brought up previously in the DWARF Standard issue 161018.1 titled "DWARF-embedded source for online-compiled programs", and is the result of runtimes like OpenCL doing online compilation in an environment where it is not desireable (or even feasible) to write sources to disk. In these cases, it would be useful to support embedding the source directly in the resulting DWARF. I would like to propose a similar solution to the one outlined in the above issue, but without structural changes to the specification. Add two new optional fields to the file_names prologue of the line table. Section 6.2.4.1: Add two bullets after "5. DW_LNCT_MD5" 6. DW_LNCT_has_source DW_LNCT_has_source indicates that the value is a boolean which affects the interpretation of an accompanying DW_LNCT_source value. When present there must be an accompanying DW_LNCT_source value. When true, consumers may use the embedded source instead of attempting to discover the source on disk. When false, consumers will ignore the DW_LNCT_source value. This code point is always paired with a flag form (e.g. DW_FORM_flag or DW_FORM_flag_present). 7. DW_LNCT_source DW_LNCT_source indicates that the value is a null-terminated string which is the original source text of the file. When present there must be an accompanying DW_LNCT_has_source value. The string will contain the UTF-8 encoded source text with '\n' line endings. When the accompanying DW_LNCT_has_source value is false, the value of DW_LNCT_source will be the empty string. This code point is always paired with a string form (e.g. DW_FORM_string, DW_FORM_line_strp, DW_FORM_strp). New type codes can be allocated for them in a backwards-compatible way, or codes for these new content types can be added in the range of [DW_LNCT_lo_user, DW_LNCT_hi_user] to avoid changing the spec itself. Table 7.27: Add DW_LNCT_has_source 0x6 Add DW_LNCT_source 0x7 Any DWARFv5 consumer which is unaware of this extension would continue to operate as before, ignoring the new fields. Any consumer which is aware of the extension would know to check DW_LNCT_has_source for each file_name entry in order to determine whether the e
[Dwarf-discuss] Enhancement: Expression Operation Vendor Extensibility Opcode
[AMD Official Use Only - General] Background == The vendor extension encoding space for DWARF expression operations accommodates only 32 unique operations. In practice, the lack of a central registry and a desire for backwards compatibility means vendor extensions are never retired, even when standard versions are accepted into DWARF proper. This has produced a situation where the effective encoding space available for new vendor extensions is miniscule today. To expand this encoding space we propose defining one DWARF operation in the official encoding space which acts as a "prefix" for vendor extensions. It is followed by a ULEB128 encoded vendor extension opcode, which is then followed by the operands of the corresponding vendor extension operation. This scheme opens up an infinite encoding space for arbitrary vendor extensions, and in practical terms is no less compact than if a fixed-size encoding were chosen, as was done for DW_LNS_extended_op. That is to say, when compared with an alternative scheme which encodes the opcode with a single unsigned byte: for the first 127 opcodes our approach is indistinguishable from the alternative scheme; for the next 128 opcodes it requires one more byte than that alternative scheme; and after 255 opcodes the alternative scheme is exhausted. Since vendor extension operations can have arbitrary semantics, the consumer must understand them to be able to continue evaluating the expression. The only use for a size operand would be for a consumer that only needs to print the expression. Omitting a size operand makes the operation encoding more compact, and this was deemed more important than the limited printing use case. Therefore no ULEB128 size operand is present to provide the number of bytes of following operands, unlike DW_LNS_extended_op. A centralized registry of vendor extension opcodes which are in use, maintained on the dwarfstd.org website or another suitable location, could also be implemented as a part of this proposal. This would remove the need for vendors to coordinate allocation themselves, and make it simpler to use more than one vendor extension at a time. As there is support for an infinite number of opcodes, the registration process could involve very limited review, and would therefore pose a minimal burden to the maintainer of such a registry. Proposal 1) In Section 2.5.1.7, p38, add a new code at the end of the list: 3. DW_OP_user The DW_OP_user opcode encodes a vendor extension operation. It has at least one operand: a ULEB128 constant identifying a vendor extension operation. The remaining operands are defined by the vendor extension. The vendor extension opcode 0 is reserved and cannot be used by any vendor extension. The DW_OP_user encoding space can be understood to supplement the space defined by DW_OP_lo_user and DW_OP_hi_user that is allocated by the standard for the same purpose. 2) In Section 7.7.1, p226, add a new row to table 7.9: DW_OP_user | TBD | 1+ | ULEB128 vendor extension opcode, followed by | | | vendor-extension-defined operands -- Dwarf-discuss mailing list Dwarf-discuss@lists.dwarfstd.org https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss