Adam Borowski <kilob...@angband.pl> writes:

> Here's an experimental tag, a step towards elimination of mojibake
> system-wide.  It checks all text files in *bin/, /usr/share/doc/ and
> those that look like a script file.  "Text" is defined as not having any
> bytes in the 0..31 range other than tabs, newlines (incl. Windows ones)
> or form feeds.  In practice, this definition appears to work pretty
> well, although the list of files that should be skipped despite being
> text needs work.

> It's a part of the "UTF-8 everywhere" release goal that I intend to
> re-propose for Stretch.

> This is only a preliminary version, let's discuss what you think.  If
> you're on DebConf, you can contact me in person.

The last time I looked at this in a policy context, the distribution
included a few documentation files that were intentionally provided
upstream in multiple different encodings.  In other words, there would be
a README.sjis and a README.utf8, etc., side-by-side.  In those cases, it
feels bad to have Lintian tag the README.sjis file and have maintainers
possibly just not install it, when it might still be a convenience to some
users.

Maybe this check should exclude files that have an extension that
indicates they were intentionally encoded in some other encoding?

-- 
Russ Allbery (r...@debian.org)               <http://www.eyrie.org/~eagle/>

Reply via email to