Marc-Andre Lemburg <[email protected]> added the comment: Martin v. Löwis wrote: > > Martin v. Löwis <[email protected]> added the comment: > > This is not a bug, see > > http://www.unicode.org/reports/tr44/#Numeric_Value > > Characters have a Numeric_Type property of either null, Decimal, Digit, or > Numeric. For non-Unihan characters, this is denoted by filling out either no > column, or (6,7,and 8), or (7 and 8), or (8), respectively, as implemented by > makeunicodedata.py. Unihan characters have only null or Numeric as their > Numeric_Type property, never Decimal nor Digit, see > > http://www.unicode.org/reports/tr44/#Numeric_Type_Han > > Therefore, it is correct that digit() raises a ValueError for U+4e09.
You're right. I guess this is a bug in the UCD or TR44/TR38 itself. It looks like the numeric properties are not separated in the Unihan database in the same way they are for the standard UCD. Unihan separates based on usage context, whereas UCS takes a parsing approach. ---------- _______________________________________ Python tracker <[email protected]> <http://bugs.python.org/issue10575> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
