Bug#575661: lintian: manpage-has-errors-from-man "Invalid or incomplete multibyte or wide character" when OK.

Osamu Aoki Sat, 27 Mar 2010 18:19:21 -0700

Package: lintian
Version: 2.3.4
Severity: normal

I think lintian is wrong in testing encoding issue for manapage.  I have
installed hello-debhelper (2.5-1).  then I downloaded its binary package
hello-debhelper_2.5-1_amd64.deb and extracted to hello.1 into a working
directry.


In short, instead of using complicated test, it should use iconv for encoding
test.

Let me show this problem.

First hello-debhelper package installed manpage shows OK under both
LANG=C and LANG=en_US.UTF-8.  I tested it with
 $ LANG=C man hello
 $ LANG=en_US.UTF-8 man hello

The only difference is copyright line.  LANG=C shows copuright as (C)
while UTF-8 uses fancy ©.  No problem.

But

$ lintian -i hello-debhelper_2.5-1_amd64.deb 
W: hello-debhelper: manpage-has-errors-from-man usr/share/man/man1/hello.1.gz  
Invalid or incomplete multibyte or wide character
N: 
N:    This man page provokes warnings or errors from man.
N:    
N:    "cannot adjust" or "can't break" are trouble with paragraph filling,
N:    usually related to long lines. Adjustment can be helped by left
N:    justifying, breaks can be helped with hyphenation, see "Manipulating
N:    Filling and Adjusting" and "Manipulating Hyphenation" in the manual.
N:    
N:    "can't find numbered character" usually means latin1 etc in the input,
N:    and this warning indicates characters will be missing from the output.
N:    You can change to escapes like \[:a] described on the groff_char man
N:    page.
N:    
N:    Other warnings are often formatting typos, like missing quotes around a
N:    string argument to .IP. These are likely to result in lost or malformed
N:    output. See the groff_man (or groff_mdoc if using mdoc) man page for
N:    information on macros.
N:    
N:    This test uses man's --warnings option to enable groff warnings that
N:    catch common mistakes, such as putting . or ' characters at the start of
N:    a line when they are intended as literal text rather than groff
N:    commands. This can be fixed either by reformatting the paragraph so that
N:    these characters are not at the start of a line, or by adding a
N:    zero-width space (\&) immediately before them.
N:    
N:    At worst, warning messages can be disabled with the .warn directive, see
N:    "Debugging" in the groff manual.
N:    
N:    To test this for yourself you can use the following command:
N:     LANG=C MANWIDTH=80 man --warnings -E UTF-8 -l manpage-file >/dev/null
N:    
N:    Severity: normal, Certainty: certain
N:
$ LANG=C MANWIDTH=80 man --warnings -E UTF-8 -l hello.1 >hello.txt
col: Invalid or incomplete multibyte or wide character
$ iconv -f utf8 -t utf8 hello.1 >/dev/null && echo "UTF-8 compatible" || echo 
"non-UTF-8 found"
UTF-8 compatible
$ iconv -f ascii -t ascii hello.1 >/dev/null && echo "ascii compatible" || echo 
"non-ascii found"
ascii compatible

The first test is the one used by lintian.  Second and third test is mine to
check encoding of source code itself.

$ LANG=C           MANWIDTH=80 man --warnings          -l hello.1 >hello.c.txt
$ LANG=en_US.UTF-8 MANWIDTH=80 man --warnings -E UTF-8 -l hello.1 >hello.u.txt
$ LANG=C           MANWIDTH=80 man --warnings -E UTF-8 -l hello.1 >hello.cu.txt
col: Invalid or incomplete multibyte or wide character
$ ls -l hello.*.txt
-rw-rw-r-- 1 osamu osamu 1417 Mar 28 09:21 hello.c.txt
-rw-rw-r-- 1 osamu osamu    0 Mar 28 09:53 hello.cu.txt
-rw-rw-r-- 1 osamu osamu 1418 Mar 28 09:21 hello.u.txt
$ diff -u hello.*.txt
--- hello.c.txt 2010-03-28 09:21:07.000000000 +0900
+++ hello.u.txt 2010-03-28 09:21:26.000000000 +0900
@@ -32,7 +32,7 @@
        General help using GNU software: <http://www.gnu.org/gethelp/>

 COPYRIGHT
-       Copyright  (C) 2010 Free Software Foundation, Inc.  License GPLv3+: GNU
+       Copyright  ©  2010  Free Software Foundation, Inc.  License GPLv3+: GNU
        GPL version 3 or later <http://gnu.org/licenses/gpl.html>
        This is free software: you are free  to  change  and  redistribute  it.
        There is NO WARRANTY, to the extent permitted by law.

The corresponding groff source has "\co" as in

Copyright \(co 2010 Free Software Foundation, Inc.

This is embedded nroff which is handled OK for both locale.

So the situation is clear.  There is no non-ASCII code in the source of
manpage.  Its manpage can be interpretted proprly with current tool set.

But test used by lintian breaks on groff copyright mark code.

I made hellox.1 in which "\co" is replaced with UTF-8 "©".  This is real error 
:-)

$ iconv -f ascii -t ascii hellox.1 >/dev/null && echo "ascii compatible" || 
echo "non-ascii found"
iconv: illegal input sequence at position 828
non-ascii found
$ iconv -f utf8 -t utf8 hellox.1 >/dev/null && echo "UTF-8 compatible" || echo 
"non-UTF-8 found"
UTF-8 compatible
$ LANG=C           MANWIDTH=80 man --warnings -E UTF-8 -l hellox.1 
>hellox.cu.txt
col: Invalid or incomplete multibyte or wide character
$ LANG=en_US.UTF-8 MANWIDTH=80 man --warnings -E UTF-8 -l hellox.1 >hellox.u.txt
$ LANG=C           MANWIDTH=80 man --warnings          -l hellox.1 >hellox.c.txt
$ ls -l hellox.*.txt
-rw-rw-r-- 1 osamu osamu 1417 Mar 28 10:03 hellox.c.txt
-rw-rw-r-- 1 osamu osamu    0 Mar 28 10:02 hellox.cu.txt
-rw-rw-r-- 1 osamu osamu 1418 Mar 28 10:03 hellox.u.txt
$ diff -u hellox.c.txt hellox.u.txt
--- hellox.c.txt        2010-03-28 10:03:28.000000000 +0900
+++ hellox.u.txt        2010-03-28 10:03:12.000000000 +0900
@@ -32,7 +32,7 @@
        General help using GNU software: <http://www.gnu.org/gethelp/>
 
 COPYRIGHT
-       Copyright  (C) 2010 Free Software Foundation, Inc.  License GPLv3+: GNU
+       Copyright  ©  2010  Free Software Foundation, Inc.  License GPLv3+: GNU
        GPL version 3 or later <http://gnu.org/licenses/gpl.html>
        This is free software: you are free  to  change  and  redistribute  it.
        There is NO WARRANTY, to the extent permitted by law.

groff is smart enough to de-UTF-8 from "©" to "(C)".

Simple iconv test detects error.


-- System Information:
Debian Release: squeeze/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-3-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages lintian depends on:
ii  binutils               2.20.1-3          The GNU assembler, linker and bina
ii  diffstat               1.47-1            produces graph of changes introduc
ii  dpkg-dev               1.15.5.6          Debian package development tools
ii  file                   5.04-1            Determines file type using "magic"
ii  gettext                0.17-10           GNU Internationalization utilities
ii  intltool-debian        0.35.0+20060710.1 Help i18n of RFC822 compliant conf
ii  libapt-pkg-perl        0.1.24            Perl interface to libapt-pkg
ii  libclass-accessor-perl 0.34-1            Perl module that automatically gen
ii  libipc-run-perl        0.84-1            Perl module for running processes
ii  libparse-debianchangel 1.1.1-2           parse Debian changelogs and output
ii  libtimedate-perl       1.2000-1          collection of modules to manipulat
ii  liburi-perl            1.53-1            module to manipulate and access UR
ii  locales-all [locales]  2.10.2-6          Embedded GNU C Library: Precompile
ii  man-db                 2.5.7-2           on-line manual pager
ii  perl [libdigest-sha-pe 5.10.1-11         Larry Wall's Practical Extraction 

lintian recommends no packages.

Versions of packages lintian suggests:
pn  binutils-multiarch            <none>     (no description available)
pn  libtext-template-perl         <none>     (no description available)
ii  man-db                        2.5.7-2    on-line manual pager

-- no debconf information



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#575661: lintian: manpage-has-errors-from-man "Invalid or incomplete multibyte or wide character" when OK.

Reply via email to