Adam Borowski <kilob...@angband.pl> writes: > Here's an experimental tag, a step towards elimination of mojibake > system-wide. It checks all text files in *bin/, /usr/share/doc/ and > those that look like a script file. "Text" is defined as not having any > bytes in the 0..31 range other than tabs, newlines (incl. Windows ones) > or form feeds. In practice, this definition appears to work pretty > well, although the list of files that should be skipped despite being > text needs work.
> It's a part of the "UTF-8 everywhere" release goal that I intend to > re-propose for Stretch. > This is only a preliminary version, let's discuss what you think. If > you're on DebConf, you can contact me in person. The last time I looked at this in a policy context, the distribution included a few documentation files that were intentionally provided upstream in multiple different encodings. In other words, there would be a README.sjis and a README.utf8, etc., side-by-side. In those cases, it feels bad to have Lintian tag the README.sjis file and have maintainers possibly just not install it, when it might still be a convenience to some users. Maybe this check should exclude files that have an extension that indicates they were intentionally encoded in some other encoding? -- Russ Allbery (r...@debian.org) <http://www.eyrie.org/~eagle/>