[issue20049] string.lowercase and string.uppercase can contain garbage

2013-12-22 Thread R. David Murray
R. David Murray added the comment: Yes, I definitely think this falls into the category of platform bugs, and we only maintain workarounds for those for "mainstream" OSes. Others need to maintain their own local patches, just as for any other changes that are required to get Python working on

[issue20049] string.lowercase and string.uppercase can contain garbage

2013-12-22 Thread Stefan Krah
Stefan Krah added the comment: IOW, I also support closing this issue. :) -- ___ Python tracker ___ ___ Python-bugs-list mailing list

[issue20049] string.lowercase and string.uppercase can contain garbage

2013-12-22 Thread Stefan Krah
Stefan Krah added the comment: Alexander, the "domain fo the function" probably refers to the range [-1, 256]. C99: The header declares several functions useful for classifying and mapping characters.166) In all cases the argument is an int, the value of which shall be representable as a

[issue20049] string.lowercase and string.uppercase can contain garbage

2013-12-22 Thread Antoine Pitrou
Antoine Pitrou added the comment: As to whether we will add a workaround for this in Python: - Python follows POSIX correctly here, and no issue was reported in mainstream OSes such as Linux, OS X or the *BSDs - this only exists in 2.7, which is in extended maintenance mode (it's the last of

[issue20049] string.lowercase and string.uppercase can contain garbage

2013-12-22 Thread Antoine Pitrou
Antoine Pitrou added the comment: To elaborate yet a bit, I agree with the following statement in the aforementioned [illumos-devel] discussion thread: """In further explanation, the isalpha() and friends *should* probably return false for the value 196, or any other byte with high order bit s

[issue20049] string.lowercase and string.uppercase can contain garbage

2013-12-22 Thread Antoine Pitrou
Antoine Pitrou added the comment: > I've discussed this once more. > > >From islower man page: > > RETURN VALUES > If the argument to any of the character handling macros is > not in the domain of the function, the result is undefined. This is not the wording of the POSIX spec: h

[issue20049] string.lowercase and string.uppercase can contain garbage

2013-12-21 Thread Alexander Pyhalov
Alexander Pyhalov added the comment: I've discussed this once more. >From islower man page: RETURN VALUES If the argument to any of the character handling macros is not in the domain of the function, the result is undefined. And (char)128-255 are not legal UTF-8 (at least what I

[issue20049] string.lowercase and string.uppercase can contain garbage

2013-12-21 Thread Alexander Pyhalov
Alexander Pyhalov added the comment: Honestly, I don't understand locale-related things good enough. But I received this explanation when discussed similar issue in illumos developers mailing list. http://comments.gmane.org/gmane.os.illumos.devel/14193 2013/12/22 Antoine Pitrou > > Antoine Pit

[issue20049] string.lowercase and string.uppercase can contain garbage

2013-12-21 Thread Antoine Pitrou
Antoine Pitrou added the comment: > The reason is that with UTF-8 locale islower()/isupper() and similar > functions are not expected to work with non-ascii symbols. Can you explain why? -- nosy: +pitrou ___ Python tracker

[issue20049] string.lowercase and string.uppercase can contain garbage

2013-12-21 Thread R. David Murray
R. David Murray added the comment: In python2, string.lowercase and string.uppercase are locale dependent. This isn't really all that useful in practice, which is why it was dropped in Python3. The proposed fix might be correct, *if* utf-8 is checked for (see, eg, Issue 6525), but...do you h

[issue20049] string.lowercase and string.uppercase can contain garbage

2013-12-21 Thread Alexander Pyhalov
New submission from Alexander Pyhalov: When Python 2.6 (or 2.7) compiled with _XOPEN_SOURCE=600 on illumos string.lowercase and string.uppercase contain garbage when UTF-8 locale is used. (OpenIndiana bug report - https://www.illumos.org/issues/4411 ). The reason is that with UTF-8 locale isl