James K. Lowden wrote: > In GNU iconv, the value 0xFF does not convert to the same value in > Unicode. > > For UTF-8 to CP1140, 0xFF becomes 0xDF > For CP1140 to UTF-8, 0xFF becomes 0x9F
In all charset converters that I've consulted (glibc 2.23, GNU libiconv, ICU 2.2, JDK 1.5, Windows 2000, Windows 2016, AIX 4.3.2, z/OS), the character set IBM-1140 or CP1140 - maps 0xFF to U+009F (see also Wikipedia [1]), - maps 0xDF to U+00FF. > COBOL has a notion of "high-value", which is guaranteed to be the > "highest" value in a character set. The reference manual for COBOL > from IBM states: > > For alphanumeric data with the EBCDIC collating sequence, > [HIGH-VALUE] is X'FF'. And what about double-byte EBCDIC? If you want the highest Unicode code point, you may use U+10FFFF. It's part of the private-use plane 16. Note that U+00FF is not the "highest" Unicode code point; it's only the highest code point in the ISO-8859-1 subset. > Given IBM's statement, to these innocent eyes it looks like a bug. A COBOL language specification has zero relevance when it comes to defining the character set conversion tables. In other words: Like it or not, but that's how IBM-1140 is defined. Bruno [1] https://en.wikipedia.org/wiki/EBCDIC
