On Wed, Jul 01, 2009 at 11:04:03PM +0500, Stepan Golosunov wrote:
> 30.06.2009 в 12:06:44 +0100 Colin Watson написал:
> > w3mman should set MAN_KEEP_FORMATTING=1 in the environment to instruct
> > man not to invoke col to strip away formatting characters, which it
> > normally does by default when writing to a pipe. I added this feature to
> > man-db with the express intention that it should be used by programs
> > like pinfo and w3mman that invoke man and can do something with its
> > formatted output. Patch attached.
> 
> Actually, w3mman in lenny shows underlined characters *unless* called
> with MAN_KEEP_FORMATTING=1 (they just aren't underlined).

Assuming that you're referring to the same test case (LC_ALL=ru_RU.UTF-8
w3mman cp), this appears to be a separate bug; w3mman2html.cgi is
failing to deal with the sequence "_" BACKSPACE <UTF-8 character>,
presumably stripping off the first byte of the UTF-8 character and
attempting to underline that. I imagine it has the same trouble with
bold (<UTF-8 character> BACKSPACE <same UTF-8 character>).

This should be straightforward enough to fix if you have the patience to
dig through the relevant regular expressions. :-) It clearly ought to be
fixed.

> But it hides non-ascii section headings when called *without*
> MAN_KEEP_FORMATTING=1.
> And this seems to be because man in this case produces something
> bogus.
> 
> 
> This is the first section heading (ИМЯ), generated by
> "MAN_KEEP_FORMATTING=1 man cp|hd" (in ru_RU.UTF-8 locale):
> d0 98 08 d0 98 d0 9c 08 d0 9c d0 af 08 d0 af 0a
> 
> 
> But "man cp|hd" generates invalid utf-8:
> d0 d0 98 d0 d0 9c d0 d0 af 0a
> 
> It's supposed to be as in "echo ИМЯ|hd":
> d0 98 d0 9c d0 af 0a

Sure, I'm entirely familiar with that symptom, which is actually a col
bug, namely #319952. The point of MAN_KEEP_FORMATTING=1 is to skip the
call to col, thus as a side-effect dodging that bug.

For a program that handles the formatting typically emitted by groff, it
is unambiguously correct to set MAN_KEEP_FORMATTING=1 to skip the col
invocation. It hadn't occurred to me that w3mman would handle UTF-8
characters wrongly in this mode, but that should be easy enough to fix.

-- 
Colin Watson                                       [[email protected]]



-- 
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]

Reply via email to