Re: [PATCH] IBM z/OS + EBCDIC support

Daniel Richard G. Fri, 25 Sep 2015 17:25:43 -0700

Hi Paul,

On Fri, 2015 Sep 25 00:29-0700, Paul Eggert wrote:
> Thanks for checking it.  On further thought, I'd rather that we went
> to inline functions, as that would have made ironing out all these
> glitches easier, and anyway inline functions are typically the way to
> go for this sort of thing nowadays.  I installed a further patch to do
> that (see URL below); it should also fix the c-ctype bugs you
> mentioned.
> 
> http://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=43a090ce05f7046457be302ae4a17e83351968b0


When I run test-c-ctype with unsigned chars, a number of assertions trip
starting at c == -127. (-127 + NCHARS == 129 == 'a'). Here is the
complete list for that value, after removing the abort() from ASSERT():

    .../test-c-ctype.c:82: assertion 'c_isascii (c) == c_isascii (c + NCHARS)' 
failed
    .../test-c-ctype.c:83: assertion 'c_isalnum (c) == c_isalnum (c + NCHARS)' 
failed
    .../test-c-ctype.c:84: assertion 'c_isalpha (c) == c_isalpha (c + NCHARS)' 
failed
    .../test-c-ctype.c:88: assertion 'c_islower (c) == c_islower (c + NCHARS)' 
failed
    .../test-c-ctype.c:89: assertion 'c_isgraph (c) == c_isgraph (c + NCHARS)' 
failed
    .../test-c-ctype.c:90: assertion 'c_isprint (c) == c_isprint (c + NCHARS)' 
failed
    .../test-c-ctype.c:94: assertion 'c_isxdigit (c) == c_isxdigit (c + 
NCHARS)' failed
    .../test-c-ctype.c:96: assertion 'to_char (c_toupper (c)) == to_char 
(c_toupper (c + NCHARS))' failed
    .../test-c-ctype.c:142: assertion 'c_islower (c) == 1' failed
    .../test-c-ctype.c:203: assertion 'c_isxdigit (c) == 1' failed
    .../test-c-ctype.c:243: assertion 'to_char (c_toupper (c)) == 'A'' failed

    (line numbers will have minor deltas due to printf() debugging)

The way the c_isxxxxx() functions are written now makes it a little
difficult for me to determine what's going on, but it should be
clearer to you.

When I run the test with signed chars, there are only a couple failures,
and they represent an odd corner case of EBCDIC.

So in z/OS, '\n' == 0x15, and that is the normal end-of-line marker:

    $ echo x | od -t x1
    0000000000    A7  15
    0000000002

According to

    
https://www-304.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.bpxbd00/risasc.htm?lang=en

ISO 8859-1 codepoint 0x0A (LF) corresponds to IBM-1047 codepoint
0x15 (NL/newline).

IBM-1047 does contain LF, at 0x25. But per IBM, that does not map to
anything in ISO 8859-1.

(IANA disagrees, of course: EBCDIC 0x15 == U+0085 and
EBCDIC 0x25 == U+000A. But that does you little good in z/OS.)

What's more, all the system isxxxxx() functions---including isascii(),
iscntrl() and isspace()---return false for 0x25.

There is probably some ancient history behind the NL<->LF mapping,
seeing as EBCDIC has both characters and ASCII only has the latter. My
hypothesis is that UNIX decided to "emulate" NL using LF, and as UNIX
become popular and linefeeds became standardized as an end-of-line
marker, IBM figured it made more sense to map it to NL (as a functional
equivalent) than to LF (as a pedantically-correct translation).

EBCDIC LF not being classified as control nor space looks dodgy. But as
it appears that all control and space characters are also isascii()
characters, I suspect IBM for whatever reason did not want to have a
codepoint that would be an exception to that rule.

So to make a long story short: After I add \x15 and remove \x25 to/from
_C_CTYPE_CNTRL for EBCDIC, the test passes in the signed-char case.


--Daniel


-- 
Daniel Richard G. || sk...@iskunk.org
My ASCII-art .sig got a bad case of Times New Roman.

Re: [PATCH] IBM z/OS + EBCDIC support

Reply via email to