On Sun, 05 Nov 2017 18:32:48 +0100 Dominique Dumont <d...@debian.org> wrote: > On Monday, 30 October 2017 15:27:32 CET you wrote: > > YAML::XS::Load (and *hopefully* the other implementations of > > YAML::Any::Load?) expect utf8 octets on input, not perl's internal > > encoding. > > Uh ? I thought I had gotten rid of YAML::Any... Well, after checking, it > turns > out that I've updated Config;:Model::Backend::Yaml, but I forgot to update > Dpkg::Scanner. > > Anyway, using YAML::Any has several problems: > - it's deprecated > - it may load YAML or YAML::XS which have some security issues [1] > > > Thus, slurp_raw should be used instead of slurp_utf8. [Though really, > > YAML::XS::Load should probably do the right thing if is_utf8 is on, > > anyway.] > > Unfortunately, the strings returned by YAML::XS is not tagged as utf-8, which > leads to writing mojibake when cme is used to update debian/copyright. > > Given the security issues of YAML and YAML::XS, I'm not going to tweak the > structure returned by YAML::XS to fix the utf8 flag of each scalar contained > the structure (and may be all hash keys ..) > > Instead, I'm going to replace YAML::Any with YAML::Tiny (which is more than > enough in this case).
Unfortunately, YAML::Tiny disallows some valid YAML markup, in particular what pyyaml generates by default and which is very difficult to change without in-depth hacking of it: ".*": "license": |- GPL-2 "debian/": "copyright": "A B <a@a>\n B C <b@b>\n C\ \ D <c@c>\n D E <d@d>\n E F\ \ <e@e>\n F G <f@f>\n G H <g@g>" "license": |- GPL-2+ As a temporary workaround, I patched the locally used version to use YAML::XS, but as I see you won’t accept this patch upstream. Is there a solution that would satisfy both conditions of how having security issues and supporting proper YAML? By the way, what are those security issues and how serious and relevant to scan-copyrights are they? > Thanks for the report . This helps me improve dpkg model for cme (and led to > the release of Config::Model::Tester 3.003 which did not handle utf-8 > correctly while checking file content). -- Cheers, Andrej