* Frank Ch. Eigler via Elfutils-devel: > Hi - > > On Tue, Nov 30, 2021 at 12:25:41PM +0100, Mark Wielaard wrote: >> [...] >> The spec does explain the requirements for JSON numbers, but doesn't >> mention any for JSON strings or JSON objects. It would be good to also >> explain how to make the strings and objects unambiguous. [...] >> For Objects it should require that all names are unique. [...] >> For Strings it should require that \uXXXX escaping isn't used [...] >> >> That should get rid of various corner cases that different parsers are >> known to get wrong. > > Are such buggy parsers seen in the wild, now, after all this time with > JSON? It seems to me it's not really elfutils' or systemd's place to > impose -additional- constraints on customary JSON.
JSON has been targeted at the Windows/Java UTF-16 world, there is always going to be a mismatch if you try to represent it in UTF-8 or anything that doesn't have surrogate pairs. >> Especially \uXXXX escaping is really confusing when using the UTF-8 >> encoding (and it is totally necessary since UTF-8 can express any >> valid UTF character already). > > Yes, and yet we have had the bidi situation recently where UTF-8 raw > codes could visually confuse a human reader whereas escaped \uXXXX > wouldn't. If we forbid \uXXXX unilaterally, we literally become > incompatible with JSON (RFC8259 7. String. "Any character may be > escaped."), and for what? RFC 8259 says this: However, the ABNF in this specification allows member names and string values to contain bit sequences that cannot encode Unicode characters; for example, "\uDEAD" (a single unpaired UTF-16 surrogate). Instances of this have been observed, for example, when a library truncates a UTF-16 string without checking whether the truncation split a surrogate pair. The behavior of software that receives JSON texts containing such values is unpredictable; for example, implementations might return different values for the length of a string value or even suffer fatal runtime exceptions. A UTF-8 environment has to enforce *some* additional constraints compared to the official JSON syntax. Thanks, Florian