Alexander Belopolsky wrote: > On Tue, Nov 16, 2010 at 1:57 PM, M.-A. Lemburg <m...@egenix.com> wrote: >> Alexander Belopolsky wrote: >>> On Tue, Nov 16, 2010 at 1:06 PM, M.-A. Lemburg <m...@egenix.com> wrote: >>> .. >>>> Now, we can't use a macro for [PyUnicode_GetMax()], since the information >>>> has >>>> to be available as callable in order to applications or extensions >>>> to use it (without recompile). >>>> >>> >>> .. but it *is* a macro resolving to either PyUnicodeUCS2_GetMax or >>> PyUnicodeUCS4_GetMax. >> >> That doesn't count :-) It's only a trick to prevent external code >> from using the wrong Unicode APIs. >> >> There still is a real function behind the renaming. >> >>> What is the scenario when may want to change >>> what PyUnicodeUCS?_GetMax return and have extensions pick up the >>> change without a recompile? >> >> If an extensions uses the stable ABI, it will want to know >> whether the interpreter was built for UCS2 or UCS4 (even if >> it doesn't use the Unicode APIs directly). >> >>> UCS2 case will certainly never change >>> since it is already 0xFFFF. Is it possible that USC4 will be expanded >>> beyond 0x10FFFF? >> >> Well, the Unicode Consortium decided to not go beyond 0x10FFFF, >> but then you never know... when they started out on the quest, >> 16 bits appeared more than enough, but they found out relatively >> quickly that the Asian scripts had enough code points to easily >> fill that space. >> >> Once space is available, it tends to get used sooner or later :-) >> >>> Note that we can have both a macro and a function >>> version. This is fairly standard practice in Python C-API. >> >> Sure, but what for ? > > Note that PyUnicode_FromOrdinal() is documented (in unicodeobject.h) > as follows without a reference to PyUnicode_GetMax(): > > """ > Create a Unicode Object from the given Unicode code point ordinal. > > The ordinal must be in range(0x10000) on narrow Python builds > (UCS2), and range(0x110000) on wide builds (UCS4). A ValueError is > raised in case it is not. > """ > > The actual implementation actually checks UCS4 range only. > > if (ordinal < 0 || ordinal > 0x10ffff) { > PyErr_SetString(PyExc_ValueError, > "chr() arg not in range(0x110000)"); > return NULL; > } > > This actually looks like a bug: > >>>> len(chr(0x10FFFF)) > 2 > > (on a USC2 build.)
Yes, it's a documentation bug. I guess someone forgot to update the comment in unicodeobject.h after the change to have chr()/unichr() return a 2-char string instead of a 1-char string for non-BMP code points. > Also, I think PyUnicode_FromOrdinal() should take Py_UNICODE argument > rather than int. No, an ordinal is a number, not a typed value. We have PyUnicode_FromUnicode() to create strings from Py_UNICODE* arrays. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 16 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com