Hi Pádraig, > I've attached a gnulib patch to document for iscntrl at least.
> +This function does not support arguments outside of the range of the > +unsigned char type in locales with large character sets, on some platforms. > +OS X 10.5 will return non zero for characters >= 0x80 in UTF-8 locales. In UTF-8 locales, arguments >= 0x80 are invalid arguments for iscntrl(). POSIX [1] says "The c argument is a type int, the value of which the application shall ensure is a character representable as an unsigned char or equal to the value of the macro EOF. If the argument has any other value, the behavior is undefined." The term "character" is defined here [2]: "A sequence of one or more bytes representing a single graphic symbol or control code." So, in a UTF-8 locale, a "character representable as an unsigned char" is a byte sequence of length 1, where the single byte has a value in the range 0x00..0x7F. For invalid values "the behavior is undefined." You were expecting a value 0. Now, in the gnulib documentations, what we mention as portability problems are the cases where - the behaviour for valid arguments is different on different platforms, or - the boundary between valid and invalid arguments is fuzzy and depends on the platform. IMO there's no point in documenting that a function _really_ has undefined behaviour when POSIX says that it has undefined behaviour. > I've also attached an alternative patch for df (in your name). This patch is correct (because the characters that you test for in c_iscntrl are 0x00..0x1F, 0x7F, which don't occur as second or later byte in a multibyte character in the EUC-JP, EUC-KR, GB2312, EUC-TW, GB18030, SJIS encodings). But it does not catch control characters outside of the ASCII range. It would make sense to catch these as well. If you want to do that, 'hide_problematic_chars' needs to be rewritten as a loop that iterates across the multibyte characters. For example with the 'mbiter' module, in combination with the mb_iscntrl function from the 'mbchar' module. Or directly with mbrtowc() and iswcntrl(). Bruno [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/iscntrl.html [2] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_87