On 5/24/2010 10:42 AM, MRAB wrote:
Mark Dickinson wrote:
Digging a bit deeper, it looks like these methods are using the Simple_{Upper,Lower,Title}case_Mapping functions described at http://www.unicode.org/Public/5.1.0/ucd/UCD.html fields 12, 13 and 14 of the unicode data; you can see this in the source in Tools/unicode/ makeunicodedata.py, which is the Python code that generates the database of unicode properties. It contains code like: if record[12]: upper = int(record[12], 16) else: upper = char if record[13]: lower = int(record[13], 16) else: lower = char if record[14]: title = int(record[14], 16) ... and so on. I agree that it might be desirable for these operations to product the multicharacter equivalents. That idea looks like a tough sell, though: apart from backwards compatibility concerns (which could probably be worked around somehow), it looks as though it would require significant effort to implement.If we were to make such a change, I think we should also cater for locale-specific case changes (passing the locale to 'upper', 'lower' and 'title'). For example, normally "i".upper() returns "I", but in Turkish "i".upper() should return "İ" (the uppercase version of lowercase dotted i is uppercase dotted I).
Given that the current (siimple) functions implement standard-defined functions, I think any change should be to *add* new 'complex-case-change' functions.
Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
