On Sun, Nov 09, 2008 at 04:48:33AM -0500, Matt Wozniski wrote:
> As you already know, man uses 'col -b -p -x' in its pipeline when piping
> the man page to a child process, unless $MAN_KEEP_FORMATTING is
> specified, and col doesn't handle UTF-8 properly.  Because of this, some
> man pages are invalid UTF-8 by the time they're done being handled, such
> as "man xterm | iconv".  However, even if -b isn't passed to col, it
> reorders things so that each character is immediately followed by the
> backspace that erases it, if any - so, "abc^H^H^Hxyz" becomes
> "a^Hxb^Hyc^Hz".  I'd like to suggest that we use this to fix the
> problem, by changing the ending of the pipeline from '| col -b -p -x' to
> '| col -p -x | sed -e "s/^\x08*//" -e "s/.\x08//g"'.  I don't think
> this would ever be likely to be accepted upstream, since it depends on
> (I think?) an implementation detail of 'col', but since we have a col
> that doesn't support multibyte, but rearranges the text to be easily
> postprocessed, and a sed that does support multibyte, this seems like
> a quick-and-dirty way to fix the problem.

Thanks for this suggestion and your patch.

With regard to acceptance upstream, since I am upstream obviously I am
looking for something acceptable to myself to some extent. :-)


I have to say that to some extent I wonder why we can't just get #319952
fixed rather than having to work around it further in man. Daniel, in
August 2007 you said:

  "i'm currently working on porting all debian changes to proper patches
  against freebsd; however, this needs another weekend time.. but i'm
  definitely on it."

Have you had a chance to make any further progress on this? That said, I
can see that this workaround with sed would be a more plausible
candidate for late inclusion in lenny than full UTF-8 support in col
would be.


Given that we know that col is broken on some systems, I wonder if the
correct upstream solution wouldn't simply be to reimplement the part of
col that we need, and call it as an internal pipeline-processing
function. I'm not a huge fan of duplication in general, but it's not as
if col is hugely complex nor as if it changes much.

-- 
Colin Watson                                       [EMAIL PROTECTED]



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to