On Wed, Oct 01, 2008 at 02:10:53AM -0700, Russ Allbery wrote:
> Niko Tyni <[EMAIL PROTECTED]> writes:
> > Any estimate on how widespread this POD problem is? Is the hardcoded
> > 'pod2man --utf8' in the Lenny perldoc going to cause more grief than
> > it's worth?
> >
> > I'm leaning on reverting that and reopening #492037 until the issue is
> > sorted out in Pod-Perldoc upstream. Adding a way to enable or disable
> > the '--utf8' option on the perldoc command line is one possibility,
> > but it might as well cause even further trouble if upstream chooses a
> > different implementation.
> 
> I looked at this some more, and there's a deeper problem.  If you run the
> current pod2man with --utf8 on an input POD file that doesn't declare an
> =encoding of UTF-8, any use of S<> in that POD file will result in invalid
> UTF-8, even if there's no use of high-bit characters in the input POD at
> all.

Thanks for pointing out =encoding to me; I completely missed that in the
documentation.

> I think the core problem was that Pod::Man is responsible for the output
> through the file handle and was missing an encoding layer.  The problem is
> that we can't just call encode() on the output, since that breaks if
> PERL_UNICODE is set or if an encoding was manually set on the file handle.
> You get double-encoding.  I think the least bad option is for Pod::Man and
> Pod::Text to force the encoding on their output file handles to UTF-8 when
> --utf8 is given.
> 
> The problem with this fix is that this now really will break pod2man
> --utf8 if POD documents don't have their encoding declared properly, since
> it will end up double-encoding the UTF-8 given that, without =encoding,
> Pod::Simple is treating the input as ISO 8859-15.  I think it's correct
> according to the specifications, but existing POD text that doesn't
> declare an encoding will get double-encoded output.  I can work around
> this by not setting a UTF-8 output encoding unless the input encoding is
> detected as UTF-8, but that's not really correct.  You *should* be able to
> have an input POD document with =encoding ISO-8859-1 and run it through
> pod2man --utf8 and get UTF-8 output.  But a POD document with no
> =encoding according to perlpodspec has an implicit =encoding ISO-8859-1.

While this is certainly something extra that people have to bear in mind
when using pod2man --utf8, it *is* an option people have to enable
manually (well, except for in perldoc; I suppose I'm more worried about
generated manual pages), and it doesn't seem too unreasonable to just
say that you have to specify =encoding when doing so. If that were
mentioned explicitly in the pod2man manual page then I think that would
be good enough.

Assuming that your intent is to run with UTF-8 across the board, then
just sticking "=encoding UTF-8" at the top of all POD files before
passing them to pod2man is sufficient, and that's not too hard. The diff
to debconf looks like this:

Index: doc/Makefile
===================================================================
--- doc/Makefile        (revision 2310)
+++ doc/Makefile        (working copy)
@@ -4,6 +4,9 @@
 pod2man=pod2man -c Debconf -r '' --utf8
 manpages:
        cd man && po4a po4a/po4a.cfg
+       for pod in man/*.pod; do \
+               perl -pi -e 'if (not $$seen and /^=head1/) { print "=encoding 
UTF-8\n\n"; $$seen = 1; }' $$pod; \
+       done
        install -d man/gen
        for num in 1 3 8; do \
                find man -maxdepth 1 -type f -name "*.$$num.pod" -printf '%P\n' 
| \

I'd prefer to do this with a po4a addendum, but it turns out to be an
absolute pain. Also this would break if any of the source documents
contained S<>. Maybe I should just change all the source documents
instead.

Perhaps it would be helpful if po4a inserted an =encoding paragraph?
After all, it understands POD and it knows the encoding.

> I think that for lenny you may want to back out of the --utf8 change and
> give it some time to settle.

Hmm, this would be a shame. With your most recent patch it's now finally
possible for debconf to generate working manual pages for Russian and
French at the same time. I understand the perldoc problem though ...

-- 
Colin Watson                                       [EMAIL PROTECTED]



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to