On Wed, Oct 01, 2008 at 11:26:23AM -0700, Russ Allbery wrote: > Niko Tyni <[EMAIL PROTECTED]> writes: > > >> I think that for lenny you may want to back out of the --utf8 change and > >> give it some time to settle. > > > > Are you referring to backing out the whole Pod::Man update (#480997) > > or just the hardcoded 'pod2man --utf8' in perldoc (#492037) ? > > Sorry, I meant only the pod2man --utf8 change in perldoc. I think that > the behavior of pod2man, while not ideal, is still basically okay for > lenny, although I'll be releasing a new version of podlators that will > implement the changes described in my previous mail.
Hm, this is looking worse the more I stare at it. I've been testing pod2man with the attached .pod file that does have '=encoding UTF-8', and the current Debian (from 5.10.0-15) 'pod2man --utf8' gives these results: - the Finnish "a with two dots", i.e. LATIN SMALL LETTER A WITH DIAERESIS, is output as its ISO-8859-1 representation (octal 344) - the Russian letter "n", CYRILLIC SMALL LETTER EN, is output in UTF-8: octal 320+275. However, there's a warning: Wide character in print at /usr/share/perl/5.10/Pod/Man.pm line 717. - S<one two> gets the ISO-8859-1 NO-BREAK SPACE in between So the output is ISO-8859-1 where possible and UTF-8 elsewhere. I really don't think this is acceptable. The pod2man output will almost never be valid UTF-8. Russ, I think the binmode($output, ":utf8") really belongs in pod2man instead of Pod::Man. Users of Pod::Man should do that themselves for their output file handle when they use the 'utf8' option. (This needs documentation, of course.) However, pod2man currently uses the parse_from_file() method, which is just a compatibility wrapper in Pod::Simple that does the open() and output_fh() calls. I suppose this should go in pod2man itself. Something like the attached patch might do, although I see there's some deeper magic in Pod::Simple. This still doesn't break anything not explicitly using the '--utf8' option, so I suppose we could get it in lenny... Comments welcome. -- Niko Tyni [EMAIL PROTECTED]
=encoding UTF-8 =head1 a with two dots ä =head1 russian letter n н =head1 non-breaking spaces S<one two>
diff --git a/pod/pod2man.PL b/pod/pod2man.PL index 3abb658..a9b5b67 100644 --- a/pod/pod2man.PL +++ b/pod/pod2man.PL @@ -89,7 +89,22 @@ my @files; do { @files = splice (@ARGV, 0, 2); print " $files[1]\n" if $verbose; - $parser->parse_from_file (@files); + if ($options{utf8}) { + my ($in, $out) = (*STDIN, *STDOUT); + $in = $files[0] if @files; + if (@files == 2) { + open($out, ">", $files[1]) + or die("open $files[1] for writing: $!"); + } else { + $out = *STDOUT; + } + binmode($out, ":utf8"); + $parser->output_fh($out); + $parser->parse_file($in); + close $out; + } else { + $parser->parse_from_file (@files); + } } while (@ARGV); __END__