Hi Paul, On Fri, 2015 Sep 25 00:29-0700, Paul Eggert wrote: > Thanks for checking it. On further thought, I'd rather that we went > to inline functions, as that would have made ironing out all these > glitches easier, and anyway inline functions are typically the way to > go for this sort of thing nowadays. I installed a further patch to do > that (see URL below); it should also fix the c-ctype bugs you > mentioned. > > http://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=43a090ce05f7046457be302ae4a17e83351968b0
When I run test-c-ctype with unsigned chars, a number of assertions trip starting at c == -127. (-127 + NCHARS == 129 == 'a'). Here is the complete list for that value, after removing the abort() from ASSERT(): .../test-c-ctype.c:82: assertion 'c_isascii (c) == c_isascii (c + NCHARS)' failed .../test-c-ctype.c:83: assertion 'c_isalnum (c) == c_isalnum (c + NCHARS)' failed .../test-c-ctype.c:84: assertion 'c_isalpha (c) == c_isalpha (c + NCHARS)' failed .../test-c-ctype.c:88: assertion 'c_islower (c) == c_islower (c + NCHARS)' failed .../test-c-ctype.c:89: assertion 'c_isgraph (c) == c_isgraph (c + NCHARS)' failed .../test-c-ctype.c:90: assertion 'c_isprint (c) == c_isprint (c + NCHARS)' failed .../test-c-ctype.c:94: assertion 'c_isxdigit (c) == c_isxdigit (c + NCHARS)' failed .../test-c-ctype.c:96: assertion 'to_char (c_toupper (c)) == to_char (c_toupper (c + NCHARS))' failed .../test-c-ctype.c:142: assertion 'c_islower (c) == 1' failed .../test-c-ctype.c:203: assertion 'c_isxdigit (c) == 1' failed .../test-c-ctype.c:243: assertion 'to_char (c_toupper (c)) == 'A'' failed (line numbers will have minor deltas due to printf() debugging) The way the c_isxxxxx() functions are written now makes it a little difficult for me to determine what's going on, but it should be clearer to you. When I run the test with signed chars, there are only a couple failures, and they represent an odd corner case of EBCDIC. So in z/OS, '\n' == 0x15, and that is the normal end-of-line marker: $ echo x | od -t x1 0000000000 A7 15 0000000002 According to https://www-304.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.bpxbd00/risasc.htm?lang=en ISO 8859-1 codepoint 0x0A (LF) corresponds to IBM-1047 codepoint 0x15 (NL/newline). IBM-1047 does contain LF, at 0x25. But per IBM, that does not map to anything in ISO 8859-1. (IANA disagrees, of course: EBCDIC 0x15 == U+0085 and EBCDIC 0x25 == U+000A. But that does you little good in z/OS.) What's more, all the system isxxxxx() functions---including isascii(), iscntrl() and isspace()---return false for 0x25. There is probably some ancient history behind the NL<->LF mapping, seeing as EBCDIC has both characters and ASCII only has the latter. My hypothesis is that UNIX decided to "emulate" NL using LF, and as UNIX become popular and linefeeds became standardized as an end-of-line marker, IBM figured it made more sense to map it to NL (as a functional equivalent) than to LF (as a pedantically-correct translation). EBCDIC LF not being classified as control nor space looks dodgy. But as it appears that all control and space characters are also isascii() characters, I suspect IBM for whatever reason did not want to have a codepoint that would be an exception to that rule. So to make a long story short: After I add \x15 and remove \x25 to/from _C_CTYPE_CNTRL for EBCDIC, the test passes in the signed-char case. --Daniel -- Daniel Richard G. || sk...@iskunk.org My ASCII-art .sig got a bad case of Times New Roman.