Hi Paul, On Tue, 2015 Sep 22 12:32-0700, Paul Eggert wrote: > Thanks for looking into this. I have some questions about the c-ctype > changes. It appears that the proposed patch defers to the system > functions (which use the current locale), but that's not the intent of > c-ctype: it's supposed to correspond to a stripped down POSIX "C" > locale regardless of the current locale settings. Is there something > special in z/OS that requires using the system functions? (E.g., does > the "C" locale behave differently depending on some *other* setting > regarding character set?)
Mainly, it was the attempt to answer the question "so what specific variant of EBCDIC are we going to target here?" that led me to use the system functions. EBCDIC-1047 is favored in z/OS, but EBCDIC-037 is also popular, and then there are the Russian/Japanese/etc. code pages that some far-flung users might want. However, unlike "normal" 8-bit encodings like ISO 8859-#, KOI8-R et al., there is no agreement in the 7-bit range, and even ASCII characters like "[" and "]" are not consistently encoded between EBCDIC variants. We don't have the option of saying, "Okay, screw all that, we'll just limit ourselves to this common subset," unless said subset excludes things like punctuation marks. My view is, it's not worth the hassle. Yes, c-ctype is not supposed to be locale-dependent. It's going to be a lot more work, and a lot more code to maintain to overcome that, and it's not likely the users of these systems will see a corresponding benefit. I think it would be better to have this for now---it's better than nothing---and if a clear need arises in the future for locale-independent behavior on z/OS (possibly by selecting an EBCDIC variant at compile time), then cross that bridge then. > With the above in mind, it's not clear what c_isascii should do. > Should it return 1 for bytes in the range 0..127, or for bytes that > correspond to ASCII bytes if one assumes the standard translation > from EBCDIC code page 037 to ASCII? (Is there a standard?) If the > former, the current code is OK; if the latter, does the system > isascii always return the same results regardless of locale and do > these results make sense? The latter behavior is the right one, IMO. If the former, there wouldn't even be a point to having an isascii() function at all; you would just do a range check. Yes, there's a standard... a whole smorgasbord to choose from ^_^ The system isascii() function is locale-dependent. With "[" and "]" depending on that, I don't see a way to get around this, unless you deliberately support one EBCDIC variant at the expense of all others. http://www-01.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.bpxbd00/risasc.htm?lang=en > Anyway, in looking through the code I see that it's hard to test a port > to EBCDIC because it uses ifdef rather than if, and I do see some > promotion bugs that you noted but we can fix these with inline functions > rather than macros (cleaner and safer nowadays), and there are a few > other style glitches (e.g., boolean values, overuse of >=) so I > installed the attached patch. This patch assumes EBCDIC control > characters are either less than ' ' or are all 1 bits, which I think is > right. The patch also tightens up the tests a bit. Yes, all control characters appear to be in [\x00-\x3F], but not everything in that range is a control character. (I remember 0x04 was not.) I tried making c_iscntrl() a simple range check at first, but that did not agree with the system iscntrl(). > This patch doesn't address the isascii problem, nor the "something > special in z/OS" problem, so quite possibly further patches will be > needed to this module. > Email had 1 attachment: > + 0001-c-ctype-port-better-to-EBCDIC.patch > 21k (text/x-patch) I'll be happy to test your [revised] patch this evening. --Daniel -- Daniel Richard G. || sk...@iskunk.org My ASCII-art .sig got a bad case of Times New Roman.