On Wed, Oct 01, 2008 at 11:26:23AM -0700, Russ Allbery wrote:
> Niko Tyni <[EMAIL PROTECTED]> writes:
> 
> >> I think that for lenny you may want to back out of the --utf8 change and
> >> give it some time to settle.
> >
> > Are you referring to backing out the whole Pod::Man update (#480997)
> > or just the hardcoded 'pod2man --utf8' in perldoc (#492037) ?
> 
> Sorry, I meant only the pod2man --utf8 change in perldoc.  I think that
> the behavior of pod2man, while not ideal, is still basically okay for
> lenny, although I'll be releasing a new version of podlators that will
> implement the changes described in my previous mail.

Hm, this is looking worse the more I stare at it.

I've been testing pod2man with the attached .pod file that does have
'=encoding UTF-8', and the current Debian (from 5.10.0-15) 'pod2man
--utf8' gives these results:

- the Finnish "a with two dots", i.e. LATIN SMALL LETTER A WITH DIAERESIS,
  is output as its ISO-8859-1 representation (octal 344)

- the Russian letter "n", CYRILLIC SMALL LETTER EN, is output in UTF-8: 
  octal 320+275. However, there's a warning:

Wide character in print at /usr/share/perl/5.10/Pod/Man.pm line 717.

- S<one two> gets the ISO-8859-1 NO-BREAK SPACE in between

So the output is ISO-8859-1 where possible and UTF-8 elsewhere.

I really don't think this is acceptable. The pod2man output will almost
never be valid UTF-8.

Russ, I think the binmode($output, ":utf8") really belongs in pod2man
instead of Pod::Man. Users of Pod::Man should do that themselves for
their output file handle when they use the 'utf8' option. (This needs
documentation, of course.)

However, pod2man currently uses the parse_from_file() method, which
is just a compatibility wrapper in Pod::Simple that does the open()
and output_fh() calls. I suppose this should go in pod2man itself.
Something like the attached patch might do, although I see there's some
deeper magic in Pod::Simple.

This still doesn't break anything not explicitly using the '--utf8'
option, so I suppose we could get it in lenny...

Comments welcome.
-- 
Niko Tyni   [EMAIL PROTECTED]
=encoding UTF-8

=head1 a with two dots

ä

=head1 russian letter n

н

=head1 non-breaking spaces

S<one two>

diff --git a/pod/pod2man.PL b/pod/pod2man.PL
index 3abb658..a9b5b67 100644
--- a/pod/pod2man.PL
+++ b/pod/pod2man.PL
@@ -89,7 +89,22 @@ my @files;
 do {
     @files = splice (@ARGV, 0, 2);
     print "  $files[1]\n" if $verbose;
-    $parser->parse_from_file (@files);
+    if ($options{utf8}) {
+        my ($in, $out) = (*STDIN, *STDOUT);
+        $in = $files[0] if @files;
+        if (@files == 2) {
+            open($out, ">", $files[1])
+                or die("open $files[1] for writing: $!");
+        } else {
+            $out = *STDOUT;
+        }
+        binmode($out, ":utf8");
+        $parser->output_fh($out);
+        $parser->parse_file($in);
+        close $out;
+    } else {
+        $parser->parse_from_file (@files);
+    }
 } while (@ARGV);
 
 __END__

Reply via email to