Hi, Liviu Daia wrote on Fri, Oct 24, 2014 at 08:37:31AM +0300: > On 24 October 2014, Ingo Schwarze <schwa...@usta.de> wrote: >> Gleydson Soares wrote on Thu, Oct 23, 2014 at 09:11:36PM -0300: >>> On Thu, Oct 23, 2014 at 10:36:44AM -0300, Gonzalo L. Rodriguez wrote:
>>>> -USE_GROFF = Yes >>> mandoc conplains: >>> >>> $ mandoc -Tlint -Werror stunnel.8 >>> mandoc: stunnel.8:35:2: ERROR: skipping unknown macro: 'br\& >>> mandoc: stunnel.8:85:37: ERROR: skipping bad character: 0xc2 >>> mandoc: stunnel.8:85:38: ERROR: skipping bad character: 0xa0 >>> mandoc: stunnel.8:1084:11: ERROR: skipping bad character: 0xc5 >>> mandoc: stunnel.8:1084:12: ERROR: skipping bad character: 0x82 >>> mandoc: stunnel.8:1085:16: ERROR: skipping bad character: 0xc5 >>> mandoc: stunnel.8:1085:17: ERROR: skipping bad character: 0x82 >>> $ >>> >>> are you sure to zap groff? >> Yes, it's a perlpod(1) manual, and these particular errors are >> harmless. >> >> - 35:2 has no ill effect, actually, it's bug in mandoc(1) that >> this bogus message is shown, i will look into fixing it. >> - 85:37-38 is merely a bug in the manual, >> two stray gibberish eight bit bytes > Not really, 0xC2 0xA0 is Unicode "NO-BREAK SPACE": > > http://www.fileformat.info/info/unicode/char/a0/index.htm > > There are probably more of these around, No kidding. > various *roff tools produce them. Really? Hopefully not. If you run into tools doing that, please do report them to me. I am willing to hunt those bugs down and talk to the upstream maintainers of such broken tools. In the case at hand, you can claim for sure that Russ Albery's pod2man(1) and David Wheeler's Pod::Simple are excessively complicated, but they are not broken in this respect. They produce correct output by default. The problem here is that the stunnel(8) maintainers don't know what they are doing. In Makefile.in, they pass the -u option (use UTF-8 in the generated roff(7) code) to pod2man(1), even though the manual explicitly states "Many *roff implementations cannot handle non-ASCII characters". That is a massive understatement. I do not know of any implementation of roff(7) that can handle that. Definitely no version of groff or mandoc ever could, and the next future releases of these two (groff-1.22.3 and mandoc-1.13.2) will not be able to do it, either. It is planned for mandoc, but work hasn't started yet. There are certainly no plans to support that in groff, or i would have heard of it. If you find *any* implementation of roff(7) that can handle UTF-8 *input* without running a recoder like preconv(1) first, i'd be glad to hear that. Now you might maybe argue that the stunnel(8) maintainers assume everybody has preconv(1) available. Strange assumption, as far as i can tell, that's groff and mandoc only, and it works badly at best for both of them. And even if stunnel(8) exclusively targets groff, it's not up to the job: $ pod2man -u stunnel.pod | preconv -eutf8 | groff -mandoc -Tps \ > stunnel.ps <standard input>:85: warning: can't find special character `u00A0' ... and the resulting PostScript file has "-fdN" without a blank in the SYNOPSIS line. $ pod2man -u stunnel.pod | preconv -eutf8 | groff -mandoc -Tascii $ pod2man -u stunnel.pod | preconv -eutf8 | groff -mandoc -Tlatin1 don't give you the blank, either, even though it's seemingly easy enough to translate a blank to ASCII. By the way, even pod2man(1) itself is unable to properly handle UTF-8 input. If you do *not* give -u, there is not attempt to encode non-ASCII characters into roff(7) escape sequences, they are just replaced with "X" characters. And i can't blame pod2man(1), it's completely unclear what it should do. If i remember correctly, last time i looked, i found four different ways to write UTF-8 escape sequences in the following three roff(7) implementations: groff, Heirloom/Solaris and plan9. None of these escape syntaxes worked for more than one implementation; groff has two alternative syntaxes exhibiting a few very subtle, probably unintended differences in the output produced. Anything that exists is utterly non-portable. So the only sane way i can see for manuals of portable software is to not use any kind of non-ASCII characters, but instead do ASCII transliterations for author names by hand when writing the manuals, and most importantly *never* use pod2man(1) -u because that breaks more than just UTF-8 characters. It also breaks spacing. Yes, this is a mess, and at some point, i need to attack this maze of problems. But it is complex. Cleaning up errno handling in src/lib/libc/rpc and src/lib/libc/yp is a simpler task. Yours, Ingo