Re: [PATCH v2] libebl: recognize FDO Packaging Metadata ELF note

2021-11-30 Thread Mark Wielaard
Hi Luca,

On Thu, 2021-11-25 at 17:02 +, Luca Boccassi via Elfutils-devel
wrote:
> +/* Packaging metadata as defined on 
> > https://systemd.io/COREDUMP_PACKAGE_METADATA/ */
> > +#define FDO_PACKAGING_METADATA 0xcafe1a7e
> > +
> >  /* Note section name of program property.   */
> >  #define NOTE_GNU_PROPERTY_SECTION_NAME ".note.gnu.property"
> 
> I could move this definition to an internal header if the change to
> elf.h blocks this patch, if you prefer? Let me know.

It looks like it will be integrated into glibc elf.h later this week.
I'll resync elf.h then and apply the other half of your patch.

While looking at how to implement the json parsing of the note data I
noticed a couple of things that could be clarified in the spec to make
this more clear and less ambiguous.

The spec says "a key-value JSON format", "JSON payload" and "a JSON
string with the structure described below". Which isn't very exact, or
seems to describe multiple different JSON concepts which aren't exactly
the same thing. A JSON string is something different from a JSON object
(which is the only JSON value that has a key-value format). And it
doesn't really make clear what the encoding of the JSON itself is (it
cannot be a JSON string, because that itself is expressed in an
specific character encoding itself).

What I think the spec should say is something like:
"The note data is a single JSON object encoded as a zero terminated
UTF-8 string."

The spec does explain the requirements for JSON numbers, but doesn't
mention any for JSON strings or JSON objects. It would be good to also
explain how to make the strings and objects unambiguous.

For Objects it should require that all names are unique. See section 4
in rfc8259.

For Strings it should require that \u escaping isn't used and that
only control characters that have an explicit escape sequence are used
(\b, \n, \f, \r, \t) [if you allow them in the first place, they are
probably confusing and it may be good to simply say that Strings should
not contain any control characters or use \u escaping]. See section
7 and section 8 in rfc8259.

That should get rid of various corner cases that different parsers are
known to get wrong. Especially \u escaping is really confusing when
using the UTF-8 encoding (and it is totally necessary since UTF-8 can
express any valid UTF character already).

Cheers,

Mark


Re: [PATCH v2] libebl: recognize FDO Packaging Metadata ELF note

2021-11-30 Thread Luca Boccassi via Elfutils-devel
On Tue, 2021-11-30 at 12:25 +0100, Mark Wielaard wrote:
> Hi Luca,
> 
> On Thu, 2021-11-25 at 17:02 +, Luca Boccassi via Elfutils-devel
> wrote:
> > +/* Packaging metadata as defined on 
> > > https://systemd.io/COREDUMP_PACKAGE_METADATA/ */
> > > +#define FDO_PACKAGING_METADATA 0xcafe1a7e
> > > +
> > >  /* Note section name of program property.   */
> > >  #define NOTE_GNU_PROPERTY_SECTION_NAME ".note.gnu.property"
> > 
> > I could move this definition to an internal header if the change to
> > elf.h blocks this patch, if you prefer? Let me know.
> 
> It looks like it will be integrated into glibc elf.h later this week.
> I'll resync elf.h then and apply the other half of your patch.
> 
> While looking at how to implement the json parsing of the note data I
> noticed a couple of things that could be clarified in the spec to make
> this more clear and less ambiguous.
> 
> The spec says "a key-value JSON format", "JSON payload" and "a JSON
> string with the structure described below". Which isn't very exact, or
> seems to describe multiple different JSON concepts which aren't exactly
> the same thing. A JSON string is something different from a JSON object
> (which is the only JSON value that has a key-value format). And it
> doesn't really make clear what the encoding of the JSON itself is (it
> cannot be a JSON string, because that itself is expressed in an
> specific character encoding itself).
> 
> What I think the spec should say is something like:
> "The note data is a single JSON object encoded as a zero terminated
> UTF-8 string."
> 
> The spec does explain the requirements for JSON numbers, but doesn't
> mention any for JSON strings or JSON objects. It would be good to also
> explain how to make the strings and objects unambiguous.
> 
> For Objects it should require that all names are unique. See section 4
> in rfc8259.
> 
> For Strings it should require that \u escaping isn't used and that
> only control characters that have an explicit escape sequence are used
> (\b, \n, \f, \r, \t) [if you allow them in the first place, they are
> probably confusing and it may be good to simply say that Strings should
> not contain any control characters or use \u escaping]. See section
> 7 and section 8 in rfc8259.
> 
> That should get rid of various corner cases that different parsers are
> known to get wrong. Especially \u escaping is really confusing when
> using the UTF-8 encoding (and it is totally necessary since UTF-8 can
> express any valid UTF character already).
> 
> Cheers,
> 
> Mark

Thanks, see:

https://github.com/systemd/systemd/pull/21578

-- 
Kind regards,
Luca Boccassi


signature.asc
Description: This is a digitally signed message part


Re: [PATCH v2] libebl: recognize FDO Packaging Metadata ELF note

2021-11-30 Thread Frank Ch. Eigler via Elfutils-devel
Hi -

On Tue, Nov 30, 2021 at 12:25:41PM +0100, Mark Wielaard wrote:
> [...]
> The spec does explain the requirements for JSON numbers, but doesn't
> mention any for JSON strings or JSON objects. It would be good to also
> explain how to make the strings and objects unambiguous. [...]
> For Objects it should require that all names are unique. [...]
> For Strings it should require that \u escaping isn't used [...]
> 
> That should get rid of various corner cases that different parsers are
> known to get wrong. 

Are such buggy parsers seen in the wild, now, after all this time with
JSON?  It seems to me it's not really elfutils' or systemd's place to
impose -additional- constraints on customary JSON.


> Especially \u escaping is really confusing when using the UTF-8
> encoding (and it is totally necessary since UTF-8 can express any
> valid UTF character already).

Yes, and yet we have had the bidi situation recently where UTF-8 raw
codes could visually confuse a human reader whereas escaped \u
wouldn't.  If we forbid \u unilaterally, we literally become
incompatible with JSON (RFC8259 7. String. "Any character may be
escaped."), and for what?

Both these seem to be personal aesthetic decisions that need not be
imposed on a routine application of well-established standards.  We
have many industrial-strength json parser libraries to choose from,
so that part is IMO no longer a timely concern.


- FChE



Re: [PATCH v2] libebl: recognize FDO Packaging Metadata ELF note

2021-11-30 Thread Florian Weimer via Elfutils-devel
* Frank Ch. Eigler via Elfutils-devel:

> Hi -
>
> On Tue, Nov 30, 2021 at 12:25:41PM +0100, Mark Wielaard wrote:
>> [...]
>> The spec does explain the requirements for JSON numbers, but doesn't
>> mention any for JSON strings or JSON objects. It would be good to also
>> explain how to make the strings and objects unambiguous. [...]
>> For Objects it should require that all names are unique. [...]
>> For Strings it should require that \u escaping isn't used [...]
>> 
>> That should get rid of various corner cases that different parsers are
>> known to get wrong. 
>
> Are such buggy parsers seen in the wild, now, after all this time with
> JSON?  It seems to me it's not really elfutils' or systemd's place to
> impose -additional- constraints on customary JSON.

JSON has been targeted at the Windows/Java UTF-16 world, there is always
going to be a mismatch if you try to represent it in UTF-8 or anything
that doesn't have surrogate pairs.

>> Especially \u escaping is really confusing when using the UTF-8
>> encoding (and it is totally necessary since UTF-8 can express any
>> valid UTF character already).
>
> Yes, and yet we have had the bidi situation recently where UTF-8 raw
> codes could visually confuse a human reader whereas escaped \u
> wouldn't.  If we forbid \u unilaterally, we literally become
> incompatible with JSON (RFC8259 7. String. "Any character may be
> escaped."), and for what?

RFC 8259 says this:

   However, the ABNF in this specification allows member names and
   string values to contain bit sequences that cannot encode Unicode
   characters; for example, "\uDEAD" (a single unpaired UTF-16
   surrogate).  Instances of this have been observed, for example, when
   a library truncates a UTF-16 string without checking whether the
   truncation split a surrogate pair.  The behavior of software that
   receives JSON texts containing such values is unpredictable; for
   example, implementations might return different values for the length
   of a string value or even suffer fatal runtime exceptions.

A UTF-8 environment has to enforce *some* additional constraints
compared to the official JSON syntax.

Thanks,
Florian



Re: [PATCH v2] libebl: recognize FDO Packaging Metadata ELF note

2021-11-30 Thread Mark Wielaard
Hi,

On Tue, Nov 30, 2021 at 05:49:58PM +0100, Florian Weimer wrote:
> * Frank Ch. Eigler via Elfutils-devel:
> > On Tue, Nov 30, 2021 at 12:25:41PM +0100, Mark Wielaard wrote:
> >> [...]
> >> The spec does explain the requirements for JSON numbers, but doesn't
> >> mention any for JSON strings or JSON objects. It would be good to also
> >> explain how to make the strings and objects unambiguous. [...]
> >> For Objects it should require that all names are unique. [...]
> >> For Strings it should require that \u escaping isn't used [...]
> >> 
> >> That should get rid of various corner cases that different parsers are
> >> known to get wrong. 
> >
> > Are such buggy parsers seen in the wild, now, after all this time with
> > JSON?  It seems to me it's not really elfutils' or systemd's place to
> > impose -additional- constraints on customary JSON.
> 
> JSON has been targeted at the Windows/Java UTF-16 world, there is always
> going to be a mismatch if you try to represent it in UTF-8 or anything
> that doesn't have surrogate pairs.

Right, the \uhex escaping is kind of odd when using UTF-8 because you
suddenly have to use surrogate pairs. It is correct that you may use
them according to the RFC. But the RFC also points out that if you do
there are various odd/bad corner cases to take care of and that it
makes comparing strings really awkward.

> >> Especially \u escaping is really confusing when using the UTF-8
> >> encoding (and it is totally necessary since UTF-8 can express any
> >> valid UTF character already).
> >
> > Yes, and yet we have had the bidi situation recently where UTF-8 raw
> > codes could visually confuse a human reader whereas escaped \u
> > wouldn't.  If we forbid \u unilaterally, we literally become
> > incompatible with JSON (RFC8259 7. String. "Any character may be
> > escaped."), and for what?
> 
> RFC 8259 says this:
> 
>However, the ABNF in this specification allows member names and
>string values to contain bit sequences that cannot encode Unicode
>characters; for example, "\uDEAD" (a single unpaired UTF-16
>surrogate).  Instances of this have been observed, for example, when
>a library truncates a UTF-16 string without checking whether the
>truncation split a surrogate pair.  The behavior of software that
>receives JSON texts containing such values is unpredictable; for
>example, implementations might return different values for the length
>of a string value or even suffer fatal runtime exceptions.
> 
> A UTF-8 environment has to enforce *some* additional constraints
> compared to the official JSON syntax.

Right, all recommendations for encoding the JSON object, string and
numbers actually come from the RFC. I don't think it is odd to
explicitly specify the encoding of the package note as using a zero
terminated UTF-8 string and to specify the allowed JSON string and
number encodings. The contents of the package note is meant to be
human readable, so excluding control characters and excotic escape
sequence in the expected key/value pairs seems natural to me.

Cheers,

Mark