Control: forwarded -1 https://rt.cpan.org/Ticket/Display.html?id=123653
On Sun, Nov 05, 2017 at 04:31:19AM +0800, 積丹尼 Dan Jacobson wrote: > X-Debbugs-Cc: makam...@cpan.org > Package: perl > Version: 5.26.1-2 > File: /usr/bin/json_pp > > This command line utility should have all character set issues already > solved internally, no? > > $ set http://radioscanningtw.jidanni.org/index.php?title=%E9%A6%96%E9%A0%81 > $ GET http://archive.org/wayback/available?url=$@ > {"url": "http://radioscanningtw.jidanni.org/index.php?title=\u9996\u9801", > "archived_snapshots": {"closest": {"status": "200", "available": true, "url": > "http://web.archive.org/web/20171104183618/http://radioscanningtw.jidanni.org/index.php?title=%E9%A6%96%E9%A0%81", > "timestamp": "20171104183618"}}} > > $ GET http://archive.org/wayback/available?url=$@ | json_pp > Wide character in print at /usr/bin/json_pp line 82, <STDIN> chunk 1. It looks like this is working as advertised. From json_pp(1): " -json_opt options to JSON::PP Acceptable options are: ascii latin1 utf8 pretty indent space_before space_after relaxed canonical allow_nonref allow_singlequote allow_barekey allow_bignum loose escape_slash " >From JSON::PP(3perl): " utf8 $json = $json->utf8([$enable]) $enabled = $json->get_utf8 If $enable is true (or missing), then the encode method will encode the JSON result into UTF-8, as required by many protocols, while the decode method expects to be handled an UTF-8-encoded string. Please note that UTF-8-encoded strings do not contain any characters outside the range 0..255, they are thus useful for bytewise/binary I/O. (In Perl 5.005, any character outside the range 0..255 does not exist. See to "UNICODE HANDLING ON PERLS".) In future versions, enabling this option might enable autodetection of the UTF-16 and UTF-32 encoding families, as described in RFC4627. If $enable is false, then the encode method will return the JSON string as a (non-encoded) Unicode string, while decode expects thus a Unicode string. Any decoding or encoding (e.g. to UTF-8 or UTF-16) needs to be done yourself, e.g. using the Encode module. " I do agree that the requirement to supply that flag is not intuitive, although I'm not sure whether this is easily fixable. For some output formats I can see that it would not make sense to always pass the utf8 flag up (for example the second example in the json_pp manpage) but perhaps it could be a bit clever for situations where it ends up printing utf8 characters to the terminal. I've forwarded this upstream to see whether it is practical to make this more user friendly. Dominic.