Hi,

Liviu Daia wrote on Fri, Oct 24, 2014 at 08:37:31AM +0300:
> On 24 October 2014, Ingo Schwarze <schwa...@usta.de> wrote:
>> Gleydson Soares wrote on Thu, Oct 23, 2014 at 09:11:36PM -0300:
>>> On Thu, Oct 23, 2014 at 10:36:44AM -0300, Gonzalo L. Rodriguez wrote:

>>>> -USE_GROFF =               Yes

>>> mandoc conplains:
>>>
>>> $ mandoc -Tlint -Werror stunnel.8       
>>> mandoc: stunnel.8:35:2: ERROR: skipping unknown macro: 'br\&
>>> mandoc: stunnel.8:85:37: ERROR: skipping bad character: 0xc2
>>> mandoc: stunnel.8:85:38: ERROR: skipping bad character: 0xa0
>>> mandoc: stunnel.8:1084:11: ERROR: skipping bad character: 0xc5
>>> mandoc: stunnel.8:1084:12: ERROR: skipping bad character: 0x82
>>> mandoc: stunnel.8:1085:16: ERROR: skipping bad character: 0xc5
>>> mandoc: stunnel.8:1085:17: ERROR: skipping bad character: 0x82
>>> $
>>> 
>>> are you sure to zap groff?

>> Yes, it's a perlpod(1) manual, and these particular errors are
>> harmless.
>> 
>>  - 35:2 has no ill effect, actually, it's bug in mandoc(1) that
>>         this bogus message is shown, i will look into fixing it.
>>  - 85:37-38 is merely a bug in the manual,
>>             two stray gibberish eight bit bytes

>     Not really, 0xC2 0xA0 is Unicode "NO-BREAK SPACE":
> 
>         http://www.fileformat.info/info/unicode/char/a0/index.htm
> 
>     There are probably more of these around,

No kidding.

> various *roff tools produce them.

Really?  Hopefully not.  If you run into tools doing that, please
do report them to me.  I am willing to hunt those bugs down and
talk to the upstream maintainers of such broken tools.

In the case at hand, you can claim for sure that Russ Albery's
pod2man(1) and David Wheeler's Pod::Simple are excessively complicated,
but they are not broken in this respect.  They produce correct
output by default.

The problem here is that the stunnel(8) maintainers don't know what
they are doing.  In Makefile.in, they pass the -u option (use UTF-8
in the generated roff(7) code) to pod2man(1), even though the manual
explicitly states "Many *roff implementations cannot handle non-ASCII
characters".  That is a massive understatement.  I do not know of
any implementation of roff(7) that can handle that.  Definitely
no version of groff or mandoc ever could, and the next future
releases of these two (groff-1.22.3 and mandoc-1.13.2) will not be
able to do it, either.  It is planned for mandoc, but work hasn't
started yet.  There are certainly no plans to support that in groff,
or i would have heard of it.  If you find *any* implementation of
roff(7) that can handle UTF-8 *input* without running a recoder
like preconv(1) first, i'd be glad to hear that.

Now you might maybe argue that the stunnel(8) maintainers assume
everybody has preconv(1) available.  Strange assumption, as far as
i can tell, that's groff and mandoc only, and it works badly at
best for both of them.  And even if stunnel(8) exclusively targets
groff, it's not up to the job:

   $ pod2man -u stunnel.pod | preconv -eutf8 | groff -mandoc -Tps \
       > stunnel.ps
  <standard input>:85: warning: can't find special character `u00A0'

 ... and the resulting PostScript file has "-fdN" without a blank
in the SYNOPSIS line.

   $ pod2man -u stunnel.pod | preconv -eutf8 | groff -mandoc -Tascii
   $ pod2man -u stunnel.pod | preconv -eutf8 | groff -mandoc -Tlatin1

don't give you the blank, either, even though it's seemingly easy
enough to translate a blank to ASCII.

By the way, even pod2man(1) itself is unable to properly handle
UTF-8 input.  If you do *not* give -u, there is not attempt to
encode non-ASCII characters into roff(7) escape sequences, they are
just replaced with "X" characters.  And i can't blame pod2man(1),
it's completely unclear what it should do.  If i remember correctly,
last time i looked, i found four different ways to write UTF-8
escape sequences in the following three roff(7) implementations:
groff, Heirloom/Solaris and plan9.  None of these escape syntaxes
worked for more than one implementation; groff has two alternative
syntaxes exhibiting a few very subtle, probably unintended differences
in the output produced.  Anything that exists is utterly non-portable.

So the only sane way i can see for manuals of portable software is
to not use any kind of non-ASCII characters, but instead do ASCII
transliterations for author names by hand when writing the manuals,
and most importantly *never* use pod2man(1) -u because that breaks
more than just UTF-8 characters.  It also breaks spacing.

Yes, this is a mess, and at some point, i need to attack this maze
of problems.  But it is complex.  Cleaning up errno handling in
src/lib/libc/rpc and src/lib/libc/yp is a simpler task.

Yours,
  Ingo

Reply via email to