On Sun, Nov 09, 2008 at 04:48:33AM -0500, Matt Wozniski wrote: > As you already know, man uses 'col -b -p -x' in its pipeline when piping > the man page to a child process, unless $MAN_KEEP_FORMATTING is > specified, and col doesn't handle UTF-8 properly. Because of this, some > man pages are invalid UTF-8 by the time they're done being handled, such > as "man xterm | iconv". However, even if -b isn't passed to col, it > reorders things so that each character is immediately followed by the > backspace that erases it, if any - so, "abc^H^H^Hxyz" becomes > "a^Hxb^Hyc^Hz". I'd like to suggest that we use this to fix the > problem, by changing the ending of the pipeline from '| col -b -p -x' to > '| col -p -x | sed -e "s/^\x08*//" -e "s/.\x08//g"'. I don't think > this would ever be likely to be accepted upstream, since it depends on > (I think?) an implementation detail of 'col', but since we have a col > that doesn't support multibyte, but rearranges the text to be easily > postprocessed, and a sed that does support multibyte, this seems like > a quick-and-dirty way to fix the problem.
Thanks for this suggestion and your patch. With regard to acceptance upstream, since I am upstream obviously I am looking for something acceptable to myself to some extent. :-) I have to say that to some extent I wonder why we can't just get #319952 fixed rather than having to work around it further in man. Daniel, in August 2007 you said: "i'm currently working on porting all debian changes to proper patches against freebsd; however, this needs another weekend time.. but i'm definitely on it." Have you had a chance to make any further progress on this? That said, I can see that this workaround with sed would be a more plausible candidate for late inclusion in lenny than full UTF-8 support in col would be. Given that we know that col is broken on some systems, I wonder if the correct upstream solution wouldn't simply be to reimplement the part of col that we need, and call it as an internal pipeline-processing function. I'm not a huge fan of duplication in general, but it's not as if col is hugely complex nor as if it changes much. -- Colin Watson [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]