Package: unicode Version: 2.8-1.1 Severity: normal unicode(1) makes some effort to make its help text (from the --help option) conform to the charset that is nominated by the environmentally-selected locale, but it screws this up. I see two distinct failure modes, and I'm not clear whether this is one bug or two. For the purposes of my test cases here, I'll be specifying a locale only through the LC_ALL environment variable, in order to avoid the trouble that I described in Bug#1061103 regarding unicode(1) having faulty logic for discerning the locale and its charset.
When it perceives a UTF-8 locale, unicode(1) successfully emits help text, encoded in UTF-8: $ env - LC_ALL=de_DE.utf8 unicode --help |egrep 'I/O|EU' | LC_ALL=C od -tc 0000000 0000020 I / O c h a r 0000040 a c t e r s e t , I a m 0000060 g u e s s i n g U T F - 8 \n 0000100 0000120 D i s p l a y A 0000140 S C I I t a b l e ( E U 342 200 0000160 223 U K T r a d e a n d C o 0000200 o p e r a t i o n \n 0000212 When it perceives an ASCII locale, it again emits help text, but it doesn't conform to the ASCII encoding. Instead it uses UTF-8: $ env - LC_ALL=C unicode --help |egrep 'I/O|EU' | LC_ALL=C od -tc 0000000 0000020 I / O c h a r 0000040 a c t e r s e t , I a m 0000060 g u e s s i n g A N S I _ X 3 0000100 . 4 - 1 9 6 8 \n 0000120 0000140 D i s p l a y A S C I I t a 0000160 b l e ( E U 342 200 223 U K T r a 0000200 d e a n d C o o p e r a t i 0000220 o n \n 0000223 When it perceives a Latin-1 locale, it fails to emit any help text, apparently due to that en dash not being encodable in Latin-1: $ env - LC_ALL=de_DE.iso88591 unicode --help |egrep 'I/O|EU' | LC_ALL=C od -tc Traceback (most recent call last): File "/usr/bin/unicode", line 1014, in <module> main() File "/usr/bin/unicode", line 941, in main (options, arguments) = parser.parse_args() File "/usr/lib/python3.9/optparse.py", line 1387, in parse_args stop = self._process_args(largs, rargs, values) File "/usr/lib/python3.9/optparse.py", line 1427, in _process_args self._process_long_opt(rargs, values) File "/usr/lib/python3.9/optparse.py", line 1501, in _process_long_opt option.process(opt, value, values, self) File "/usr/lib/python3.9/optparse.py", line 784, in process return self.take_action( File "/usr/lib/python3.9/optparse.py", line 807, in take_action parser.print_help() File "/usr/lib/python3.9/optparse.py", line 1647, in print_help file.write(self.format_help()) UnicodeEncodeError: 'latin-1' codec can't encode character '\u2013' in position 1661: ordinal not in range(256) 0000000 unicode(1) ought to successfully emit help text in any of these locales, and the help text ought to always conform to the environmentally-selected locale. For ASCII and Latin-1 locales this implies that it can't use that en dash, and must substitute an ASCII "-". I have no strong opinion about whether it should use the en dash in a UTF-8 locale. -zefram