[Dwarf-Discuss] DWARF and source text embedding

2018-01-31 Thread scott

Hello all,

I am a compiler engineer at AMD, working on tools for debugging 
online-compiled
programs. The problem I am attempting to solve was brought up previously 
in the
DWARF Standard issue 161018.1 titled "DWARF-embedded source for 
online-compiled
programs", and is the result of runtimes like OpenCL doing online 
compilation

in an environment where it is not desireable (or even feasible) to write
sources to disk. In these cases, it would be useful to support embedding 
the
source directly in the resulting DWARF. I would like to propose a 
similar
solution to the one outlined in the above issue, but without structural 
changes

to the specification.



Add two new optional fields to the file_names prologue of the line 
table.


Section 6.2.4.1:
Add two bullets after "5. DW_LNCT_MD5"
6. DW_LNCT_has_source
DW_LNCT_has_source indicates that the value is a boolean which 
affects the
interpretation of an accompanying DW_LNCT_source value. When present 
there
must be an accompanying DW_LNCT_source value. When true, consumers 
may use
the embedded source instead of attempting to discover the source on 
disk.
When false, consumers will ignore the DW_LNCT_source value. This 
code point

is always paired with a flag form (e.g. DW_FORM_flag or
DW_FORM_flag_present).
7. DW_LNCT_source
DW_LNCT_source indicates that the value is a null-terminated string 
which
is the original source text of the file. When present there must be 
an
accompanying DW_LNCT_has_source value. The string will contain the 
UTF-8

encoded source text with '\n' line endings. When the accompanying
DW_LNCT_has_source value is false, the value of DW_LNCT_source will 
be the
empty string. This code point is always paired with a string form 
(e.g.

DW_FORM_string, DW_FORM_line_strp, DW_FORM_strp).

New type codes can be allocated for them in a backwards-compatible way, 
or

codes for these new content types can be added in the range of
[DW_LNCT_lo_user, DW_LNCT_hi_user] to avoid changing the spec itself.

Table 7.27:
Add DW_LNCT_has_source  0x6
Add DW_LNCT_source  0x7

Any DWARFv5 consumer which is unaware of this extension would continue 
to
operate as before, ignoring the new fields. Any consumer which is aware 
of the
extension would know to check DW_LNCT_has_source for each file_name 
entry in
order to determine whether the embedded source field (DW_LNCT_source) 
contains

the source text of the corresponding file.



My team and I believe this simplifies the design by removing the need 
for
changes to the compile unit sections, and by avoiding the addition of 
multiple

file_name_entry_formats in a single program, all without sacrificing any
information. We have a preliminary implementation in LLVM/Clang, which 
supports

embedding source (clang -gdwarf-5 -gembed-source) and inspecting it via
llvm-dwarfdump and llvm-objdump (with the -source flag). The patches are
available at https://reviews.llvm.org/D42765 (LLVM) and
https://reviews.llvm.org/D42766 (Clang).

I would like any and all feedback on the design, and want to see about 
the
possibility of adding the new content type codes outside of the "user" 
range
(i.e. adding new entries for them in Table 7.27) in the next version of 
the

specification.

Regards,
Scott Linder

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] DWARF and source text embedding

2018-01-31 Thread Michael Eager

Hi Scott --

Please submit your proposal at http://dwarfstd.org/Comment.php.



On 01/31/2018 02:05 PM, sc...@scottlinder.com wrote:

Hello all,

I am a compiler engineer at AMD, working on tools for debugging 
online-compiled
programs. The problem I am attempting to solve was brought up previously 
in the
DWARF Standard issue 161018.1 titled "DWARF-embedded source for 
online-compiled
programs", and is the result of runtimes like OpenCL doing online 
compilation

in an environment where it is not desireable (or even feasible) to write
sources to disk. In these cases, it would be useful to support embedding 
the

source directly in the resulting DWARF. I would like to propose a similar
solution to the one outlined in the above issue, but without structural 
changes

to the specification.



Add two new optional fields to the file_names prologue of the line table.

Section 6.2.4.1:
Add two bullets after "5. DW_LNCT_MD5"
6. DW_LNCT_has_source
     DW_LNCT_has_source indicates that the value is a boolean which 
affects the
     interpretation of an accompanying DW_LNCT_source value. When 
present there
     must be an accompanying DW_LNCT_source value. When true, consumers 
may use
     the embedded source instead of attempting to discover the source on 
disk.
     When false, consumers will ignore the DW_LNCT_source value. This 
code point

     is always paired with a flag form (e.g. DW_FORM_flag or
     DW_FORM_flag_present).
7. DW_LNCT_source
     DW_LNCT_source indicates that the value is a null-terminated string 
which

     is the original source text of the file. When present there must be an
     accompanying DW_LNCT_has_source value. The string will contain the 
UTF-8

     encoded source text with '\n' line endings. When the accompanying
     DW_LNCT_has_source value is false, the value of DW_LNCT_source will 
be the
     empty string. This code point is always paired with a string form 
(e.g.

     DW_FORM_string, DW_FORM_line_strp, DW_FORM_strp).

New type codes can be allocated for them in a backwards-compatible way, or
codes for these new content types can be added in the range of
[DW_LNCT_lo_user, DW_LNCT_hi_user] to avoid changing the spec itself.

Table 7.27:
Add DW_LNCT_has_source  0x6
Add DW_LNCT_source  0x7

Any DWARFv5 consumer which is unaware of this extension would continue to
operate as before, ignoring the new fields. Any consumer which is aware 
of the
extension would know to check DW_LNCT_has_source for each file_name 
entry in
order to determine whether the embedded source field (DW_LNCT_source) 
contains

the source text of the corresponding file.



My team and I believe this simplifies the design by removing the need for
changes to the compile unit sections, and by avoiding the addition of 
multiple

file_name_entry_formats in a single program, all without sacrificing any
information. We have a preliminary implementation in LLVM/Clang, which 
supports

embedding source (clang -gdwarf-5 -gembed-source) and inspecting it via
llvm-dwarfdump and llvm-objdump (with the -source flag). The patches are
available at https://reviews.llvm.org/D42765 (LLVM) and
https://reviews.llvm.org/D42766 (Clang).

I would like any and all feedback on the design, and want to see about the
possibility of adding the new content type codes outside of the "user" 
range

(i.e. adding new entries for them in Table 7.27) in the next version of the
specification.

Regards,
Scott Linder

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org



--
Michael Eagerea...@eagerm.com
1960 Park Blvd., Palo Alto, CA 94306
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] DWARF and source text embedding

2018-01-31 Thread John DelSignore
Hi Scott,

Question: What does the DW_TAG_compile_unit look like for an embedded source 
file? For example, what does the DW_AT_name and DW_AT_comp_dir look like?

Cheers, John D.


On 01/31/18 17:05, sc...@scottlinder.com wrote:
> Hello all,
>
> I am a compiler engineer at AMD, working on tools for debugging 
> online-compiled
> programs. The problem I am attempting to solve was brought up previously in 
> the
> DWARF Standard issue 161018.1 titled "DWARF-embedded source for 
> online-compiled
> programs", and is the result of runtimes like OpenCL doing online compilation
> in an environment where it is not desireable (or even feasible) to write
> sources to disk. In these cases, it would be useful to support embedding the
> source directly in the resulting DWARF. I would like to propose a similar
> solution to the one outlined in the above issue, but without structural 
> changes
> to the specification.
>
> 
>
> Add two new optional fields to the file_names prologue of the line table.
>
> Section 6.2.4.1:
> Add two bullets after "5. DW_LNCT_MD5"
> 6. DW_LNCT_has_source
> DW_LNCT_has_source indicates that the value is a boolean which affects the
> interpretation of an accompanying DW_LNCT_source value. When present there
> must be an accompanying DW_LNCT_source value. When true, consumers may use
> the embedded source instead of attempting to discover the source on disk.
> When false, consumers will ignore the DW_LNCT_source value. This code 
> point
> is always paired with a flag form (e.g. DW_FORM_flag or
> DW_FORM_flag_present).
> 7. DW_LNCT_source
> DW_LNCT_source indicates that the value is a null-terminated string which
> is the original source text of the file. When present there must be an
> accompanying DW_LNCT_has_source value. The string will contain the UTF-8
> encoded source text with '\n' line endings. When the accompanying
> DW_LNCT_has_source value is false, the value of DW_LNCT_source will be the
> empty string. This code point is always paired with a string form (e.g.
> DW_FORM_string, DW_FORM_line_strp, DW_FORM_strp).
>
> New type codes can be allocated for them in a backwards-compatible way, or
> codes for these new content types can be added in the range of
> [DW_LNCT_lo_user, DW_LNCT_hi_user] to avoid changing the spec itself.
>
> Table 7.27:
> Add DW_LNCT_has_source  0x6
> Add DW_LNCT_source  0x7
>
> Any DWARFv5 consumer which is unaware of this extension would continue to
> operate as before, ignoring the new fields. Any consumer which is aware of the
> extension would know to check DW_LNCT_has_source for each file_name entry in
> order to determine whether the embedded source field (DW_LNCT_source) contains
> the source text of the corresponding file.
>
> 
>
> My team and I believe this simplifies the design by removing the need for
> changes to the compile unit sections, and by avoiding the addition of multiple
> file_name_entry_formats in a single program, all without sacrificing any
> information. We have a preliminary implementation in LLVM/Clang, which 
> supports
> embedding source (clang -gdwarf-5 -gembed-source) and inspecting it via
> llvm-dwarfdump and llvm-objdump (with the -source flag). The patches are
> available at https://reviews.llvm.org/D42765 (LLVM) and
> https://reviews.llvm.org/D42766 (Clang).
>
> I would like any and all feedback on the design, and want to see about the
> possibility of adding the new content type codes outside of the "user" range
> (i.e. adding new entries for them in Table 7.27) in the next version of the
> specification.
>
> Regards,
> Scott Linder
>
> ___
> Dwarf-Discuss mailing list
> Dwarf-Discuss@lists.dwarfstd.org
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
>

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org