I just discovered that, in all versions of Python as far back as I
have access to (2.0), \uXXXX escapes are interpreted inside raw
unicode strings. Thus:

>>> a = ur"\u1234"
>>> len(a)
1
>>>

Contrast this with:

>>> a = ur"\x12"
>>> len(a)
4
>>>

The \U escape has the same behavior, in versions that support it.

Does anyone remember why it is done this way? The reference manual
describes this behavior, but doesn't give an explanation:

"""
When an "r" or "R" prefix is used in conjunction with a "u" or "U"
prefix, then the \uXXXX and \UXXXXXXXX escape sequences are processed
while all other backslashes are left in the string. For example, the
string literal ur"\u0062\n" consists of three Unicode characters:
`LATIN SMALL LETTER B', `REVERSE SOLIDUS', and `LATIN SMALL LETTER N'.
Backslashes can be escaped with a preceding backslash; however, both
remain in the string. As a result, \uXXXX escape sequences are only
recognized when there are an odd number of backslashes.
"""

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to