[Dwarf-Discuss] DWARF and source text embedding

2018-01-31 Thread scott

Hello all,

I am a compiler engineer at AMD, working on tools for debugging 
online-compiled
programs. The problem I am attempting to solve was brought up previously 
in the
DWARF Standard issue 161018.1 titled "DWARF-embedded source for 
online-compiled
programs", and is the result of runtimes like OpenCL doing online 
compilation

in an environment where it is not desireable (or even feasible) to write
sources to disk. In these cases, it would be useful to support embedding 
the
source directly in the resulting DWARF. I would like to propose a 
similar
solution to the one outlined in the above issue, but without structural 
changes

to the specification.



Add two new optional fields to the file_names prologue of the line 
table.


Section 6.2.4.1:
Add two bullets after "5. DW_LNCT_MD5"
6. DW_LNCT_has_source
DW_LNCT_has_source indicates that the value is a boolean which 
affects the
interpretation of an accompanying DW_LNCT_source value. When present 
there
must be an accompanying DW_LNCT_source value. When true, consumers 
may use
the embedded source instead of attempting to discover the source on 
disk.
When false, consumers will ignore the DW_LNCT_source value. This 
code point

is always paired with a flag form (e.g. DW_FORM_flag or
DW_FORM_flag_present).
7. DW_LNCT_source
DW_LNCT_source indicates that the value is a null-terminated string 
which
is the original source text of the file. When present there must be 
an
accompanying DW_LNCT_has_source value. The string will contain the 
UTF-8

encoded source text with '\n' line endings. When the accompanying
DW_LNCT_has_source value is false, the value of DW_LNCT_source will 
be the
empty string. This code point is always paired with a string form 
(e.g.

DW_FORM_string, DW_FORM_line_strp, DW_FORM_strp).

New type codes can be allocated for them in a backwards-compatible way, 
or

codes for these new content types can be added in the range of
[DW_LNCT_lo_user, DW_LNCT_hi_user] to avoid changing the spec itself.

Table 7.27:
Add DW_LNCT_has_source  0x6
Add DW_LNCT_source  0x7

Any DWARFv5 consumer which is unaware of this extension would continue 
to
operate as before, ignoring the new fields. Any consumer which is aware 
of the
extension would know to check DW_LNCT_has_source for each file_name 
entry in
order to determine whether the embedded source field (DW_LNCT_source) 
contains

the source text of the corresponding file.



My team and I believe this simplifies the design by removing the need 
for
changes to the compile unit sections, and by avoiding the addition of 
multiple

file_name_entry_formats in a single program, all without sacrificing any
information. We have a preliminary implementation in LLVM/Clang, which 
supports

embedding source (clang -gdwarf-5 -gembed-source) and inspecting it via
llvm-dwarfdump and llvm-objdump (with the -source flag). The patches are
available at https://reviews.llvm.org/D42765 (LLVM) and
https://reviews.llvm.org/D42766 (Clang).

I would like any and all feedback on the design, and want to see about 
the
possibility of adding the new content type codes outside of the "user" 
range
(i.e. adding new entries for them in Table 7.27) in the next version of 
the

specification.

Regards,
Scott Linder

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] DWARF and source text embedding

2018-02-01 Thread scott

Hi John,

In the case where the files are actually available on disk, and the 
source is simply being "cached", the attributes are exactly the same. In 
the case where sources are generated, and so have no true path on disk, 
I would suggest we might just leave the exact meaning to be 
implementation defined; the producer can still provide valuable 
information which will aid in locating where sources originate, such as 
indicating the OpenCL kernel name. Consumers which are unaware of this 
extension will simply fail to find the source (as before), while new 
consumers can at least provide an identifier to distinguish sources.


The remaining attributes (DW_AT_language, DW_AT_producer, etc.) seem 
pretty naturally orthogonal.


Regards,
Scott

On 2018-01-31 14:40, John DelSignore wrote:

Hi Scott,

Question: What does the DW_TAG_compile_unit look like for an embedded
source file? For example, what does the DW_AT_name and DW_AT_comp_dir
look like?

Cheers, John D.


On 01/31/18 17:05, sc...@scottlinder.com wrote:

Hello all,

I am a compiler engineer at AMD, working on tools for debugging 
online-compiled
programs. The problem I am attempting to solve was brought up 
previously in the
DWARF Standard issue 161018.1 titled "DWARF-embedded source for 
online-compiled
programs", and is the result of runtimes like OpenCL doing online 
compilation
in an environment where it is not desireable (or even feasible) to 
write
sources to disk. In these cases, it would be useful to support 
embedding the
source directly in the resulting DWARF. I would like to propose a 
similar
solution to the one outlined in the above issue, but without 
structural changes

to the specification.



Add two new optional fields to the file_names prologue of the line 
table.


Section 6.2.4.1:
Add two bullets after "5. DW_LNCT_MD5"
6. DW_LNCT_has_source
DW_LNCT_has_source indicates that the value is a boolean which 
affects the
interpretation of an accompanying DW_LNCT_source value. When 
present there
must be an accompanying DW_LNCT_source value. When true, consumers 
may use
the embedded source instead of attempting to discover the source 
on disk.
When false, consumers will ignore the DW_LNCT_source value. This 
code point

is always paired with a flag form (e.g. DW_FORM_flag or
DW_FORM_flag_present).
7. DW_LNCT_source
DW_LNCT_source indicates that the value is a null-terminated 
string which
is the original source text of the file. When present there must 
be an
accompanying DW_LNCT_has_source value. The string will contain the 
UTF-8

encoded source text with '\n' line endings. When the accompanying
DW_LNCT_has_source value is false, the value of DW_LNCT_source 
will be the
empty string. This code point is always paired with a string form 
(e.g.

DW_FORM_string, DW_FORM_line_strp, DW_FORM_strp).

New type codes can be allocated for them in a backwards-compatible 
way, or

codes for these new content types can be added in the range of
[DW_LNCT_lo_user, DW_LNCT_hi_user] to avoid changing the spec itself.

Table 7.27:
Add DW_LNCT_has_source  0x6
Add DW_LNCT_source  0x7

Any DWARFv5 consumer which is unaware of this extension would continue 
to
operate as before, ignoring the new fields. Any consumer which is 
aware of the
extension would know to check DW_LNCT_has_source for each file_name 
entry in
order to determine whether the embedded source field (DW_LNCT_source) 
contains

the source text of the corresponding file.



My team and I believe this simplifies the design by removing the need 
for
changes to the compile unit sections, and by avoiding the addition of 
multiple
file_name_entry_formats in a single program, all without sacrificing 
any
information. We have a preliminary implementation in LLVM/Clang, which 
supports
embedding source (clang -gdwarf-5 -gembed-source) and inspecting it 
via
llvm-dwarfdump and llvm-objdump (with the -source flag). The patches 
are

available at https://reviews.llvm.org/D42765 (LLVM) and
https://reviews.llvm.org/D42766 (Clang).

I would like any and all feedback on the design, and want to see about 
the
possibility of adding the new content type codes outside of the "user" 
range
(i.e. adding new entries for them in Table 7.27) in the next version 
of the

specification.

Regards,
Scott Linder

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] DWARF and source text embedding

2018-02-01 Thread scott

Hi Paul,

My intention was to support an empty source string; I want to be 
explicit about the presence of embedded source for each file.


When reading the spec I did notice places where an empty string can 
indicate the absence of the attribute (e.g. DW_AT_name), but I would 
prefer to be explicit here.


Scott

On 2018-02-01 11:19, paul.robin...@sony.com wrote:

-Original Message-
From: Dwarf-Discuss [mailto:dwarf-discuss-boun...@lists.dwarfstd.org] 
On

Behalf Of sc...@scottlinder.com
Sent: Wednesday, January 31, 2018 2:05 PM
To: dwarf-discuss@lists.dwarfstd.org
Subject: [Dwarf-Discuss] DWARF and source text embedding

Hello all,

I am a compiler engineer at AMD, working on tools for debugging
online-compiled
programs. The problem I am attempting to solve was brought up 
previously

in the
DWARF Standard issue 161018.1 titled "DWARF-embedded source for
online-compiled
programs", and is the result of runtimes like OpenCL doing online
compilation
in an environment where it is not desireable (or even feasible) to 
write
sources to disk. In these cases, it would be useful to support 
embedding

the
source directly in the resulting DWARF. I would like to propose a
similar
solution to the one outlined in the above issue, but without 
structural

changes
to the specification.



Add two new optional fields to the file_names prologue of the line
table.

Section 6.2.4.1:
Add two bullets after "5. DW_LNCT_MD5"
6. DW_LNCT_has_source
 DW_LNCT_has_source indicates that the value is a boolean which
affects the
 interpretation of an accompanying DW_LNCT_source value. When 
present

there
 must be an accompanying DW_LNCT_source value. When true, 
consumers

may use
 the embedded source instead of attempting to discover the source 
on

disk.
 When false, consumers will ignore the DW_LNCT_source value. This
code point
 is always paired with a flag form (e.g. DW_FORM_flag or
 DW_FORM_flag_present).
7. DW_LNCT_source
 DW_LNCT_source indicates that the value is a null-terminated 
string

which
 is the original source text of the file. When present there must 
be

an
 accompanying DW_LNCT_has_source value. The string will contain 
the

UTF-8
 encoded source text with '\n' line endings. When the accompanying
 DW_LNCT_has_source value is false, the value of DW_LNCT_source 
will

be the
 empty string. This code point is always paired with a string form
(e.g.
 DW_FORM_string, DW_FORM_line_strp, DW_FORM_strp).


Would a zero-length string indicate something other than 
"has_source=false"?

If not, then a separate has_source flag seems redundant.
--paulr



New type codes can be allocated for them in a backwards-compatible 
way,

or
codes for these new content types can be added in the range of
[DW_LNCT_lo_user, DW_LNCT_hi_user] to avoid changing the spec itself.

Table 7.27:
Add DW_LNCT_has_source  0x6
Add DW_LNCT_source  0x7

Any DWARFv5 consumer which is unaware of this extension would continue
to
operate as before, ignoring the new fields. Any consumer which is 
aware

of the
extension would know to check DW_LNCT_has_source for each file_name
entry in
order to determine whether the embedded source field (DW_LNCT_source)
contains
the source text of the corresponding file.



My team and I believe this simplifies the design by removing the need
for
changes to the compile unit sections, and by avoiding the addition of
multiple
file_name_entry_formats in a single program, all without sacrificing 
any

information. We have a preliminary implementation in LLVM/Clang, which
supports
embedding source (clang -gdwarf-5 -gembed-source) and inspecting it 
via
llvm-dwarfdump and llvm-objdump (with the -source flag). The patches 
are

available at https://reviews.llvm.org/D42765 (LLVM) and
https://reviews.llvm.org/D42766 (Clang).

I would like any and all feedback on the design, and want to see about
the
possibility of adding the new content type codes outside of the "user"
range
(i.e. adding new entries for them in Table 7.27) in the next version 
of

the
specification.

Regards,
Scott Linder

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org

___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] DWARF and source text embedding

2018-02-13 Thread scott

Michael, Paul,

In the current proposal, it is not an error to have any value (including 
an
empty string) in the _source attribute when the _has_source flag is 
true, which

allows for embedding an empty source string.

After seeing more feedback on this point, I think you are right that the 
extra
flag is unnecessary. Looking at similar attributes like MD5 and how they 
are
handled I think it would be best to modify the proposal to remove the 
flag and

require the source be present on all files in the same line table if the
attribute is present in the prologue. I still think we should have 
wording
which indicates an empty string is still a valid value for embedded 
source, and
should not be interpreted as indicating the absence of embedded source 
for
that file. This is analogous to the current MD5 attribute, as even 16 
null

bytes is a valid MD5. What are your thoughts on this approach?

Scott

On 2018-02-01 17:20, Michael Eager wrote:

On 02/01/2018 12:01 PM, sc...@scottlinder.com wrote:

Hi Paul,

My intention was to support an empty source string; I want to be 
explicit about the presence of embedded source for each file.


I'm not fond of the belt and suspenders approach.  If there is one
specifier for an attribute, there's no need for a second to say that
it's valid.  There's always the issue of what it means when the two
attributes disagree, such as when you have a flag saying that there
is embedded source, but the source string is empty.  Is that an error?

When reading the spec I did notice places where an empty string can 
indicate the absence of the attribute (e.g. DW_AT_name), but I would 
prefer to be explicit here.


Scott

On 2018-02-01 11:19, paul.robin...@sony.com wrote:

-Original Message-
From: Dwarf-Discuss 
[mailto:dwarf-discuss-boun...@lists.dwarfstd.org] On

Behalf Of sc...@scottlinder.com
Sent: Wednesday, January 31, 2018 2:05 PM
To: dwarf-discuss@lists.dwarfstd.org
Subject: [Dwarf-Discuss] DWARF and source text embedding

Hello all,

I am a compiler engineer at AMD, working on tools for debugging
online-compiled
programs. The problem I am attempting to solve was brought up 
previously

in the
DWARF Standard issue 161018.1 titled "DWARF-embedded source for
online-compiled
programs", and is the result of runtimes like OpenCL doing online
compilation
in an environment where it is not desireable (or even feasible) to 
write
sources to disk. In these cases, it would be useful to support 
embedding

the
source directly in the resulting DWARF. I would like to propose a
similar
solution to the one outlined in the above issue, but without 
structural

changes
to the specification.



Add two new optional fields to the file_names prologue of the line
table.

Section 6.2.4.1:
Add two bullets after "5. DW_LNCT_MD5"
6. DW_LNCT_has_source
 DW_LNCT_has_source indicates that the value is a boolean which
affects the
 interpretation of an accompanying DW_LNCT_source value. When 
present

there
 must be an accompanying DW_LNCT_source value. When true, 
consumers

may use
 the embedded source instead of attempting to discover the 
source on

disk.
 When false, consumers will ignore the DW_LNCT_source value. 
This

code point
 is always paired with a flag form (e.g. DW_FORM_flag or
 DW_FORM_flag_present).
7. DW_LNCT_source
 DW_LNCT_source indicates that the value is a null-terminated 
string

which
 is the original source text of the file. When present there 
must be

an
 accompanying DW_LNCT_has_source value. The string will contain 
the

UTF-8
 encoded source text with '\n' line endings. When the 
accompanying
 DW_LNCT_has_source value is false, the value of DW_LNCT_source 
will

be the
 empty string. This code point is always paired with a string 
form

(e.g.
 DW_FORM_string, DW_FORM_line_strp, DW_FORM_strp).


Would a zero-length string indicate something other than 
"has_source=false"?

If not, then a separate has_source flag seems redundant.
--paulr



New type codes can be allocated for them in a backwards-compatible 
way,

or
codes for these new content types can be added in the range of
[DW_LNCT_lo_user, DW_LNCT_hi_user] to avoid changing the spec 
itself.


Table 7.27:
Add DW_LNCT_has_source  0x6
Add DW_LNCT_source  0x7

Any DWARFv5 consumer which is unaware of this extension would 
continue

to
operate as before, ignoring the new fields. Any consumer which is 
aware

of the
extension would know to check DW_LNCT_has_source for each file_name
entry in
order to determine whether the embedded source field 
(DW_LNCT_source)

contains
the source text of the corresponding file.



My team and I believe this simplifies the design by removing the 
need

for
changes to the compile unit sections, and by avoiding the addition 
of

multiple
file_name_entry_formats in a single program, all without sacrificing 
any
information. We have a preliminary implementa

Re: [Dwarf-Discuss] DWARF and source text embedding

2018-02-13 Thread scott

Michael,

In the case of this proposal, then, I suggest the CU fields
(AT_{name,comp_dir}) retain their exact current definitions. Language
implementations, regardless of whether they might want to support 
embedding

source, currently use the filesystem. This extension is essentially just
cacheing source which may become unavailable to the consumer by the time 
the
program is debugged. This means the producer can put standard values in 
each CU
field, and also embed source in the line table. If in the future there 
is a
need to add CU fields or modify existing ones to capture some other 
attribute,

that can be done in a different proposal.

Scott

On 2018-02-01 17:32, Michael Eager wrote:

On 02/01/2018 08:07 AM, sc...@scottlinder.com wrote:

Hi John,

In the case where the files are actually available on disk, and the 
source is simply being "cached", the attributes are exactly the same. 
In the case where sources are generated, and so have no true path on 
disk, I would suggest we might just leave the exact meaning to be 
implementation defined; the producer can still provide valuable 
information which will aid in locating where sources originate, such 
as indicating the OpenCL kernel name. Consumers which are unaware of 
this extension will simply fail to find the source (as before), while 
new consumers can at least provide an identifier to distinguish 
sources.


Implementation-defined generally means that different implementations
will be incompatible.  Incompatible implementations are the antithesis
of a standard.

As a general DWARF principle, there should be no secret understandings
between producer and consumer. There should be no "secret handshake"
such as the one you describe where a producer provides "valuable
information" in some undefined manner usable only by a consumer which
is "in on the secret".  It's not that a different consumer doesn't
implement the extension, it's that a different consumer cannot 
implement

the extension.

Attributes which have a defined meaning, such as AT_name or 
AT_comp_dir,

should have a well defined meaning in all circumstances.



The remaining attributes (DW_AT_language, DW_AT_producer, etc.) seem 
pretty naturally orthogonal.


Regards,
Scott

On 2018-01-31 14:40, John DelSignore wrote:

Hi Scott,

Question: What does the DW_TAG_compile_unit look like for an embedded
source file? For example, what does the DW_AT_name and DW_AT_comp_dir
look like?

Cheers, John D.


On 01/31/18 17:05, sc...@scottlinder.com wrote:

Hello all,

I am a compiler engineer at AMD, working on tools for debugging 
online-compiled
programs. The problem I am attempting to solve was brought up 
previously in the
DWARF Standard issue 161018.1 titled "DWARF-embedded source for 
online-compiled
programs", and is the result of runtimes like OpenCL doing online 
compilation
in an environment where it is not desireable (or even feasible) to 
write
sources to disk. In these cases, it would be useful to support 
embedding the
source directly in the resulting DWARF. I would like to propose a 
similar
solution to the one outlined in the above issue, but without 
structural changes

to the specification.



Add two new optional fields to the file_names prologue of the line 
table.


Section 6.2.4.1:
Add two bullets after "5. DW_LNCT_MD5"
6. DW_LNCT_has_source
    DW_LNCT_has_source indicates that the value is a boolean which 
affects the
    interpretation of an accompanying DW_LNCT_source value. When 
present there
    must be an accompanying DW_LNCT_source value. When true, 
consumers may use
    the embedded source instead of attempting to discover the source 
on disk.
    When false, consumers will ignore the DW_LNCT_source value. This 
code point

    is always paired with a flag form (e.g. DW_FORM_flag or
    DW_FORM_flag_present).
7. DW_LNCT_source
    DW_LNCT_source indicates that the value is a null-terminated 
string which
    is the original source text of the file. When present there must 
be an
    accompanying DW_LNCT_has_source value. The string will contain 
the UTF-8
    encoded source text with '\n' line endings. When the 
accompanying
    DW_LNCT_has_source value is false, the value of DW_LNCT_source 
will be the
    empty string. This code point is always paired with a string 
form (e.g.

    DW_FORM_string, DW_FORM_line_strp, DW_FORM_strp).

New type codes can be allocated for them in a backwards-compatible 
way, or

codes for these new content types can be added in the range of
[DW_LNCT_lo_user, DW_LNCT_hi_user] to avoid changing the spec 
itself.


Table 7.27:
Add DW_LNCT_has_source  0x6
Add DW_LNCT_source  0x7

Any DWARFv5 consumer which is unaware of this extension would 
continue to
operate as before, ignoring the new fields. Any consumer which is 
aware of the
extension would know to check DW_LNCT_has_source for each file_name 
entry in
order to determine whether the e

[Dwarf-discuss] Enhancement: Expression Operation Vendor Extensibility Opcode

2023-03-24 Thread Linder, Scott via Dwarf-discuss
[AMD Official Use Only - General]

Background
==

The vendor extension encoding space for DWARF expression operations
accommodates only 32 unique operations. In practice, the lack of a central
registry and a desire for backwards compatibility means vendor extensions are
never retired, even when standard versions are accepted into DWARF proper. This
has produced a situation where the effective encoding space available for new
vendor extensions is miniscule today.

To expand this encoding space we propose defining one DWARF operation in the
official encoding space which acts as a "prefix" for vendor extensions. It is
followed by a ULEB128 encoded vendor extension opcode, which is then followed
by the operands of the corresponding vendor extension operation.

This scheme opens up an infinite encoding space for arbitrary vendor
extensions, and in practical terms is no less compact than if a fixed-size
encoding were chosen, as was done for DW_LNS_extended_op. That is to say, when
compared with an alternative scheme which encodes the opcode with a single
unsigned byte: for the first 127 opcodes our approach is indistinguishable from
the alternative scheme; for the next 128 opcodes it requires one more byte than
that alternative scheme; and after 255 opcodes the alternative scheme is
exhausted.

Since vendor extension operations can have arbitrary semantics, the consumer
must understand them to be able to continue evaluating the expression. The only
use for a size operand would be for a consumer that only needs to print the
expression. Omitting a size operand makes the operation encoding more compact,
and this was deemed more important than the limited printing use case.
Therefore no ULEB128 size operand is present to provide the number of bytes of
following operands, unlike DW_LNS_extended_op.

A centralized registry of vendor extension opcodes which are in use, maintained
on the dwarfstd.org website or another suitable location, could also be
implemented as a part of this proposal. This would remove the need for vendors
to coordinate allocation themselves, and make it simpler to use more than one
vendor extension at a time. As there is support for an infinite number of
opcodes, the registration process could involve very limited review, and would
therefore pose a minimal burden to the maintainer of such a registry.

Proposal


1) In Section 2.5.1.7, p38, add a new code at the end of the list:

3. DW_OP_user
The DW_OP_user opcode encodes a vendor extension operation. It has at
least one operand: a ULEB128 constant identifying a vendor extension
operation. The remaining operands are defined by the vendor extension.
The vendor extension opcode 0 is reserved and cannot be used by any
vendor extension.

The DW_OP_user encoding space can be understood to supplement the
space defined by DW_OP_lo_user and DW_OP_hi_user that is allocated by
the standard for the same purpose.

2) In Section 7.7.1, p226, add a new row to table 7.9:

DW_OP_user  |  TBD  |  1+  | ULEB128 vendor extension opcode, followed by
|   |  | vendor-extension-defined operands
-- 
Dwarf-discuss mailing list
Dwarf-discuss@lists.dwarfstd.org
https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss