Re: [Cython] Py_UNICODE* string support
On Sun, 03 Mar 2013 13:52:49 +0600, Stefan Behnel wrote: Are you aware that Py_UNICODE is deprecated as of Py3.3? http://docs.python.org/3.4/c-api/unicode.html Your changes look a bit excessive for supporting something that's inefficient in recent Python versions and basically "dead". Yes, I'm well aware of Py3.3 changes, but consider this: 1. _All_ system APIs on Windows, old, new and in-between, use UTF-16 in the form of zero-terminated 2-byte wchar_t* strings (on Windows Py_UNICODE is _always_ aliased to wchar_t specifically for this reason). Whatever happens to Python internals, the need to interoperate with UTF-16 based platforms won't go away. 2. PY_UNICODE family of APIs remains the recommended way to interoperate with Windows. (So said the autor of PEP393 himself, I could find the relevant discussion in python-dev.) 3. It is not _that_ inefficient. Actually, it has the same efficiency as the UTF8-related APIs (which have to be used on UTF-8 platforms like most *nix systems). UTF8 allows sharing of ASCII buffer and has to convert USC2/UCS4, Py_UNICODE shares UCS2 buffer (assuming narrow build) and has to convert ASCII. One alternative to Py_UNICODE that I have rejected is using Python's wchar_t support. It's practicaly useless for these reasons: 1) wchar_t APIs do not exist in Py2 and have to be implemented for compatibility. 2) Implementing them brings in all the pain of nonportable wchar_t type (on *nix systems in general), whereas it's the primary users would target Windows, where (pretty horrible) wchar_t portability workarounds would be dead code. 3) wchar_t APIs do not offer a zero-copy option and do not manage the memory for us. The changes are some 50 lines of code, not counting the tests. I wouldn't call that excessive. And they mostly mirror existing code, no trickery of any kind. Inbuilt Py_UNICODE* support also means that the users would be shielded from 3.3 changes and Cython is free to optimize sting handling in the future. Believe me, nobody calls Py_UNICODE APIs because they want to, they just have to. Best regards, Nikita Nemkin ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Py_UNICODE* string support
Nikita Nemkin, 03.03.2013 09:25: > On Sun, 03 Mar 2013 13:52:49 +0600, Stefan Behnel wrote: >> Are you aware that Py_UNICODE is deprecated as of Py3.3? >> >> http://docs.python.org/3.4/c-api/unicode.html >> >> Your changes look a bit excessive for supporting something that's >> inefficient in recent Python versions and basically "dead". > > Yes, I'm well aware of Py3.3 changes, but consider this: > > 1. _All_ system APIs on Windows, old, new and in-between, use UTF-16 in the >form of zero-terminated 2-byte wchar_t* strings (on Windows Py_UNICODE is >_always_ aliased to wchar_t specifically for this reason). >Whatever happens to Python internals, the need to interoperate with >UTF-16 based platforms won't go away. Ok, fine with me. Your changes look fairly reasonable, especially for a first try. I have the following comments. 1) I would like to get rid of UnicodeConst. A Py_UNICODE* is not different from any other C array, except that it can coerce to and from Unicode strings. So the representation of a literal should be a (properly reference counted) Python Unicode object, and users would be allowed to cast them to , just as we support it for and bytes. 2) non-BMP literals should be supported by representing them as normal Unicode strings and creating the Py_UNICODE representation at need (i.e. explicitly through a cast, at runtime). Py_UNICODE[] literals are simply not portable. 3) __Pyx_Py_UNICODE_strlen() is ok, but only for the special case that all we have is a Py_UNICODE*. As long as we are dealing with Unicode string objects, that won't be needed, so len() should be constant time in the normal case instead of linear time. 4) most of the changes in PyrexTypes.py and ExprNodes.py look ok. I would eventually like to see a couple of refactorings on these sections (because the special cases add up over time), but that's not required for this change. So, the basic idea would be to use Unicode strings and their (optional) internal representation as Py_UNICODE[] instead of making Py_UNICODE[] a first class data type. And then go from there and optimise certain things to use the unpacked array directly, so that users won't need to put explicit C-API calls into their code. Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] About IndexNode and unicode[index]
2013/3/2 Stefan Behnel : > Stefan Behnel, 28.02.2013 22:16: > > https://github.com/scoder/cython/commit/cc4f7daec3b1f19b5acaed7766e2b6f86902ad94 > > Stefan > I tried to build with that change. Tests `unicode_indexing` and `index` are passed. ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Py_UNICODE* string support
On Sun, 03 Mar 2013 15:32:36 +0600, Stefan Behnel wrote: 1) I would like to get rid of UnicodeConst. A Py_UNICODE* is not different from any other C array, except that it can coerce to and from Unicode strings. So the representation of a literal should be a (properly reference counted) Python Unicode object, and users would be allowed to cast them to , just as we support it for and bytes. I understand the idea. Since Python unicode literals are implicitly coercible to Py_UNICODE*, there appears to be no need for C-level Py_UNICODE[] literals. Indeed, client code will look exactly (!) the same whether they are supported or not. Except when it comes to nogil. (For example, native callbacks are almost guaranteed to be nogil.) Hiding Python operations in what appears to be pure C-level code will break users' assumptions. This is #1 reason why I went for C-level literals. #2 reason is efficiency on Py3.3. C-level literals don't need conversions and don't call any conversion APIs. 2) non-BMP literals should be supported by representing them as normal Unicode strings and creating the Py_UNICODE representation at need (i.e. explicitly through a cast, at runtime). Py_UNICODE[] literals are simply not portable. Py_UNICODE[] literals can be made fully portable if non-BMP ones are wrapped like this: #ifdef Py_UNICODE_WIDE static const k_xxx[] = { , 0 }; #else static const k_xxx[] = { , 0 }; #endif Literals containing only BMP chars are already portable and don't need this wrapping. 3) __Pyx_Py_UNICODE_strlen() is ok, but only for the special case that all we have is a Py_UNICODE*. As long as we are dealing with Unicode string objects, that won't be needed, so len() should be constant time in the normal case instead of linear time. len(Py_UNICODE*) simply mirrors len(char*). Its putpose is to provide platform-independent Py_UNICODE_strlen (which is Py3 only and deprecated in 3.3). So, the basic idea would be to use Unicode strings and their (optional) internal representation as Py_UNICODE[] instead of making Py_UNICODE[] a first class data type. And then go from there and optimise certain things to use the unpacked array directly, so that users won't need to put explicit C-API calls into their code. Please reconsider your decision wrt C-level literals. I believe that nogil code and a bit of efficiency (on 3.3) justify their existence. (char* literals do have C-level literals, Py_UNICODE* is in the same basket when it comes to Windows code). The code to support them is also small and well-contained. I've updated my pull request to fully support for non-BMP Py_UNICODE[] literals. If you are still not convinced, so be it, I'll drop C-level literal support. Best regards, Nikita Nemkin PS. I made a false claim in the previous mail. (Some of) Python's wchar_t APIs do exist in Py2. But they won't manage the memory automatically anyway. ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] To Add datetime.pxd to cython.cpython
2013/3/2 Stefan Behnel : > Hi, > > the last pull request looks good to me now. > > https://github.com/cython/cython/pull/189 > > Any more comments on it? > > Stefan > As was suggested earlier, I added `import_datetime` inline function to initialize PyDateTime C API instead of direct usage of "non-native" C macros from datetime.h. Now you call `import_array ()` first in the same way as is done with `numpy`. This approach looks natural in the light of experience with numpy. Zaur Shibzukhov ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] To Add datetime.pxd to cython.cpython
2013/3/3 Zaur Shibzukhov : > 2013/3/2 Stefan Behnel : >> Hi, >> >> the last pull request looks good to me now. >> >> https://github.com/cython/cython/pull/189 >> >> Any more comments on it? > > As was suggested earlier, I added `import_datetime` inline function to > initialize PyDateTime C API instead of direct usage of "non-native" C > macros from datetime.h. > Now you call `import_array ()` first in the same way as is done with `numpy`. > This approach looks natural in the light of experience with numpy. > I make some performance comparisons. Here example for dates. # test_date.pyx Here test code: from cpython.datetime cimport import_datetime, date_new, date import_datetime() from datetime import date as pydate def test_date1(): cdef list lst = [] for year in range(1000, 2001): for month in range(1,13): for day in range(1, 20): d = pydate(year, month, day) lst.append(d) return lst def test_date2(): cdef list lst = [] for year in range(1000, 2001): for month in range(1,13): for day in range(1, 20): d = date(year, month, day) lst.append(d) return lst def test_date3(): cdef list lst = [] cdef int year, month, day for year in range(1000, 2001): for month in range(1,13): for day in range(1, 20): d = date_new(year, month, day) lst.append(d) return lst def test1(): l = test_date1() return l def test2(): l = test_date2() return l def test3(): l = test_date3() return l Here are timings: (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from mytests.test_date import test1" "test1()" 50 loops, best of 5: 83.2 msec per loop (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from mytests.test_date import test2" "test2()" 50 loops, best of 5: 74.7 msec per loop (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from mytests.test_date import test3" "test3()" 50 loops, best of 5: 20.9 msec per loop OSX 10.6.8 64 bit python 3.2 Shibzukhov Zaur ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] To Add datetime.pxd to cython.cpython
2013/3/3 Zaur Shibzukhov : > 2013/3/3 Zaur Shibzukhov : >> 2013/3/2 Stefan Behnel : >>> Hi, >>> >>> the last pull request looks good to me now. >>> >>> https://github.com/cython/cython/pull/189 >>> >>> Any more comments on it? >> >> As was suggested earlier, I added `import_datetime` inline function to >> initialize PyDateTime C API instead of direct usage of "non-native" C >> macros from datetime.h. >> Now you call `import_array ()` first in the same way as is done with `numpy`. >> This approach looks natural in the light of experience with numpy. >> > I make some performance comparisons. Here example for dates. > > # test_date.pyx > > > Here test code: > > from cpython.datetime cimport import_datetime, date_new, date > > import_datetime() > > from datetime import date as pydate > > def test_date1(): > cdef list lst = [] > for year in range(1000, 2001): > for month in range(1,13): > for day in range(1, 20): > d = pydate(year, month, day) > lst.append(d) > return lst > > > def test_date2(): > cdef list lst = [] > for year in range(1000, 2001): > for month in range(1,13): > for day in range(1, 20): > d = date(year, month, day) > lst.append(d) > return lst > > def test_date3(): > cdef list lst = [] > cdef int year, month, day > for year in range(1000, 2001): > for month in range(1,13): > for day in range(1, 20): > d = date_new(year, month, day) > lst.append(d) > return lst > > def test1(): > l = test_date1() > return l > > def test2(): > l = test_date2() > return l > > def test3(): > l = test_date3() > return l > > Here are timings: > > (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from > mytests.test_date import test1" "test1()" > 50 loops, best of 5: 83.2 msec per loop > (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from > mytests.test_date import test2" "test2()" > 50 loops, best of 5: 74.7 msec per loop > (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from > mytests.test_date import test3" "test3()" > 50 loops, best of 5: 20.9 msec per loop > > OSX 10.6.8 64 bit python 3.2 > More acurate test... # coding: utf-8 from cpython.datetime cimport import_datetime, date_new, date import_datetime() from datetime import date as pydate def test_date1(): cdef list lst = [] cdef int year, month, day for year in range(1000, 2001): for month in range(1,13): for day in range(1, 20): d = pydate(year, month, day) lst.append(d) return lst def test_date2(): cdef list lst = [] cdef int year, month, day for year in range(1000, 2001): for month in range(1,13): for day in range(1, 20): d = date(year, month, day) lst.append(d) return lst def test_date3(): cdef list lst = [] cdef int year, month, day for year in range(1000, 2001): for month in range(1,13): for day in range(1, 20): d = date_new(year, month, day) lst.append(d) return lst def test1(): l = test_date1() return l def test2(): l = test_date2() return l def test3(): l = test_date3() return l Timings: (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from mytests.test_date import test1" "test1()" 50 loops, best of 5: 83.3 msec per loop (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from mytests.test_date import test2" "test2()" 50 loops, best of 5: 74.6 msec per loop (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from mytests.test_date import test3" "test3()" 50 loops, best of 5: 20.8 msec per loop Shibzukhov Zaur ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Py_UNICODE* string support
Nikita Nemkin, 03.03.2013 14:40: > Please reconsider your decision wrt C-level literals. > I believe that nogil code and a bit of efficiency (on 3.3) justify their > existence. (char* literals do have C-level literals, Py_UNICODE* is in > the same basket when it comes to Windows code). > The code to support them is also small and well-contained. > I've updated my pull request to fully support for non-BMP Py_UNICODE[] > literals. Ok, I think it's ok now. I can accept the special casing of Py_UNICODE literals, it actually adds a value. As one little nit-pick, may I ask you to rename the new name references to "unicode" into "py_unicode" in your code? For example, "is_unicode", "get_unicode_const", "unicode_const_index", etc. Given that Py_UNICODE is no longer the native equivalent of Python's unicode type in Py3.3, I'd like to avoid confusion in the code. The name "unicode" is much more likely to refer to the builtin Python type than to a native C type when it appears in Cython's sources. Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Py_UNICODE* string support
Stefan Behnel, 03.03.2013 20:41: > Nikita Nemkin, 03.03.2013 14:40: >> Please reconsider your decision wrt C-level literals. >> I believe that nogil code and a bit of efficiency (on 3.3) justify their >> existence. (char* literals do have C-level literals, Py_UNICODE* is in >> the same basket when it comes to Windows code). >> The code to support them is also small and well-contained. >> I've updated my pull request to fully support for non-BMP Py_UNICODE[] >> literals. > > Ok, I think it's ok now. I can accept the special casing of Py_UNICODE > literals, it actually adds a value. > > As one little nit-pick, may I ask you to rename the new name references to > "unicode" into "py_unicode" in your code? For example, "is_unicode", > "get_unicode_const", "unicode_const_index", etc. Given that Py_UNICODE is > no longer the native equivalent of Python's unicode type in Py3.3, I'd like > to avoid confusion in the code. The name "unicode" is much more likely to > refer to the builtin Python type than to a native C type when it appears in > Cython's sources. Oh, and yet another thing: could you write up some documentation for this in docs/src/tutorial/strings.rst ? Basically a Windows/wchar_t related section, that also warns about the inefficiency in Py3.3, so that users don't accidentally assume it's efficient for anything that needs to be portable. Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] To Add datetime.pxd to cython.cpython
2013/3/3 Zaur Shibzukhov : > 2013/3/3 Zaur Shibzukhov : >> 2013/3/3 Zaur Shibzukhov : >>> 2013/3/2 Stefan Behnel : Hi, the last pull request looks good to me now. https://github.com/cython/cython/pull/189 Any more comments on it? >>> >>> As was suggested earlier, I added `import_datetime` inline function to >>> initialize PyDateTime C API instead of direct usage of "non-native" C >>> macros from datetime.h. >>> Now you call `import_array ()` first in the same way as is done with >>> `numpy`. >>> This approach looks natural in the light of experience with numpy. >>> >> I make some performance comparisons. Here example for dates. >> >> # test_date.pyx >> >> >> Here test code: >> >> from cpython.datetime cimport import_datetime, date_new, date >> >> import_datetime() >> >> from datetime import date as pydate >> >> def test_date1(): >> cdef list lst = [] >> for year in range(1000, 2001): >> for month in range(1,13): >> for day in range(1, 20): >> d = pydate(year, month, day) >> lst.append(d) >> return lst >> >> >> def test_date2(): >> cdef list lst = [] >> for year in range(1000, 2001): >> for month in range(1,13): >> for day in range(1, 20): >> d = date(year, month, day) >> lst.append(d) >> return lst >> >> def test_date3(): >> cdef list lst = [] >> cdef int year, month, day >> for year in range(1000, 2001): >> for month in range(1,13): >> for day in range(1, 20): >> d = date_new(year, month, day) >> lst.append(d) >> return lst >> >> def test1(): >> l = test_date1() >> return l >> >> def test2(): >> l = test_date2() >> return l >> >> def test3(): >> l = test_date3() >> return l >> >> Here are timings: >> >> (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from >> mytests.test_date import test1" "test1()" >> 50 loops, best of 5: 83.2 msec per loop >> (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from >> mytests.test_date import test2" "test2()" >> 50 loops, best of 5: 74.7 msec per loop >> (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from >> mytests.test_date import test3" "test3()" >> 50 loops, best of 5: 20.9 msec per loop >> >> OSX 10.6.8 64 bit python 3.2 >> > > More acurate test... > > # coding: utf-8 > > from cpython.datetime cimport import_datetime, date_new, date > > import_datetime() > > from datetime import date as pydate > > def test_date1(): > cdef list lst = [] > cdef int year, month, day > for year in range(1000, 2001): > for month in range(1,13): > for day in range(1, 20): > d = pydate(year, month, day) > lst.append(d) > return lst > > > def test_date2(): > cdef list lst = [] > cdef int year, month, day > for year in range(1000, 2001): > for month in range(1,13): > for day in range(1, 20): > d = date(year, month, day) > lst.append(d) > return lst > > def test_date3(): > cdef list lst = [] > cdef int year, month, day > for year in range(1000, 2001): > for month in range(1,13): > for day in range(1, 20): > d = date_new(year, month, day) > lst.append(d) > return lst > > def test1(): > l = test_date1() > return l > > def test2(): > l = test_date2() > return l > > def test3(): > l = test_date3() > return l > > Timings: > > (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from > mytests.test_date import test1" "test1()" > 50 loops, best of 5: 83.3 msec per loop > (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from > mytests.test_date import test2" "test2()" > 50 loops, best of 5: 74.6 msec per loop > (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from > mytests.test_date import test3" "test3()" > 50 loops, best of 5: 20.8 msec per loop Yet another performance comparison for `time`: # coding: utf-8 from cpython.datetime cimport import_datetime, time_new, time import_datetime() from datetime import time as pytime def test_time1(): cdef list lst = [] cdef int hour, minute, second, microsecond for hour in range(0, 24): for minute in range(0,60): for second in range(0, 60): for microsecond in range(0, 10, 5): d = pytime(hour, minute, second, microsecond) lst.append(d) return lst def test_time2(): cdef list lst = [] cdef int hour, minute, second, microsecond for hour in range(0, 24): for minute in range(0,60): for second in range(0, 60): for microsecond in range(0, 10, 5): d = time(hour, minute, second, microsecond) lst.append(d) return lst def test_time3(): cd