On Thu, Feb 28, 2008 at 09:21:41PM +0100, Adam Borowski wrote: > On Thu, Feb 28, 2008 at 10:42:30AM +0100, Michelle Konzack wrote: > > It seems there is a common problem while setting up the correct UNICODE > > locale in systems. As the posster in the attached message has written, > > he has setup his locale to "zh_CN.utf8" which is wrong, but as he has > > written too, the output of "locale -a" show it. > > No way which way the _locale_ is spelt (including "vi_VI" without even the > word "utf" inside),
Irrelevant to this bug, as you'll see if you look at the code. > the _charset_ is UTF-8. No program ever should look at the locale's > name, as it has more quirks like this. Checking the charset will get > you what you want. > > > I think, there should be a global solution for this, since patching > > man-db is worthless. > > Actually, it's groff what's at fault here. Mostly. man-db really does have some special-casing here. Trust me. It was necessary at the time. There are a finite number of known aliases for the very small number of locales in question, and until it becomes unnecessary I will simply support those. (And I agree that it should go away, but can't easily just yet.) Please don't drag groff into this bug. I really hate it when bugs drift wildly off their original (accurately-constrained) topic despite attempts to haul them back. It makes them impossible to keep organised. > > $ LANG=zh_CN.UTF-8 man --warnings -l ls.zh_CN.1 > /dev/null > > $ LANG=zh_CN.utf8 man --warnings -l ls.zh_CN.1 > /dev/null > > <standard input>:9: warning: can't find special character `u013F' > > <standard input>:9: warning: can't find special character `u011A' > > <standard input>:9: warning: can't find special character `u021D' > > <standard input>:11: warning: can't find special character `u0321' > > <standard input>:11: warning: can't find special character `u04AA' > > <standard input>:12: warning: can't find special character `u0461' > > // snip > > Too bad, groff doesn't have real Unicode support, and supports only several > special-cased locales (which may then be transcoded as UTF-8, but they still > get wrapped into their old-style charsets). > > Instead of changing the special-case recognition, I would instead completely > skip special-casing and just treat all characters equally. Including, but > not limited to, u013F and u0461. Are you working with Brian M. Carlson on this? He has been working on a solution acceptable to groff upstream, which is, frankly, the only way I want to go now. He has already made substantial progress with character class support. Treating all characters equally will absolutely not be acceptable to groff upstream. groff is a typesetter and needs to know about properties of characters. Cheers, -- Colin Watson [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]