Re: [Cython] Py_UNICODE* string support

2013-03-03 Thread Nikita Nemkin
On Sun, 03 Mar 2013 13:52:49 +0600, Stefan Behnel   
wrote:



Are you aware that Py_UNICODE is deprecated as of Py3.3?

http://docs.python.org/3.4/c-api/unicode.html

Your changes look a bit excessive for supporting something that's
inefficient in recent Python versions and basically "dead".


Yes, I'm well aware of Py3.3 changes, but consider this:

1. _All_ system APIs on Windows, old, new and in-between, use UTF-16 in  
the form of
   zero-terminated 2-byte wchar_t* strings (on Windows Py_UNICODE is  
_always_ aliased

   to wchar_t specifically for this reason).
   Whatever happens to Python internals, the need to interoperate with  
UTF-16 based

   platforms won't go away.

2. PY_UNICODE family of APIs remains the recommended way to interoperate  
with Windows.
   (So said the autor of PEP393 himself, I could find the relevant  
discussion in python-dev.)


3. It is not _that_ inefficient. Actually, it has the same efficiency as  
the UTF8-related APIs

   (which have to be used on UTF-8 platforms like most *nix systems).

   UTF8 allows sharing of ASCII buffer and has to convert USC2/UCS4,
   Py_UNICODE shares UCS2 buffer (assuming narrow build) and has to  
convert ASCII.



One alternative to Py_UNICODE that I have rejected is using Python's  
wchar_t support.

It's practicaly useless for these reasons:
1) wchar_t APIs do not exist in Py2 and have to be implemented for  
compatibility.

2) Implementing them brings in all the pain of nonportable wchar_t type
   (on *nix systems in general), whereas it's the primary users would  
target Windows,
   where (pretty horrible) wchar_t portability workarounds would be dead  
code.
3) wchar_t APIs do not offer a zero-copy option and do not manage the  
memory for us.



The changes are some 50 lines of code, not counting the tests. I wouldn't  
call that excessive.

And they mostly mirror existing code, no trickery of any kind.

Inbuilt Py_UNICODE* support also means that the users would be shielded  
from 3.3 changes

and Cython is free to optimize sting handling in the future.
Believe me, nobody calls Py_UNICODE APIs because they want to, they just  
have to.



Best regards,
Nikita Nemkin
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Py_UNICODE* string support

2013-03-03 Thread Stefan Behnel
Nikita Nemkin, 03.03.2013 09:25:
> On Sun, 03 Mar 2013 13:52:49 +0600, Stefan Behnel wrote:
>> Are you aware that Py_UNICODE is deprecated as of Py3.3?
>>
>> http://docs.python.org/3.4/c-api/unicode.html
>>
>> Your changes look a bit excessive for supporting something that's
>> inefficient in recent Python versions and basically "dead".
> 
> Yes, I'm well aware of Py3.3 changes, but consider this:
> 
> 1. _All_ system APIs on Windows, old, new and in-between, use UTF-16 in the
>form of zero-terminated 2-byte wchar_t* strings (on Windows Py_UNICODE is
>_always_ aliased to wchar_t specifically for this reason).
>Whatever happens to Python internals, the need to interoperate with
>UTF-16 based platforms won't go away.

Ok, fine with me.

Your changes look fairly reasonable, especially for a first try. I have the
following comments.

1) I would like to get rid of UnicodeConst. A Py_UNICODE* is not different
from any other C array, except that it can coerce to and from Unicode
strings. So the representation of a literal should be a (properly reference
counted) Python Unicode object, and users would be allowed to cast them to
, just as we support it for  and bytes.

2) non-BMP literals should be supported by representing them as normal
Unicode strings and creating the Py_UNICODE representation at need (i.e.
explicitly through a cast, at runtime). Py_UNICODE[] literals are simply
not portable.

3) __Pyx_Py_UNICODE_strlen() is ok, but only for the special case that all
we have is a Py_UNICODE*. As long as we are dealing with Unicode string
objects, that won't be needed, so len() should be constant time in the
normal case instead of linear time.

4) most of the changes in PyrexTypes.py and ExprNodes.py look ok. I would
eventually like to see a couple of refactorings on these sections (because
the special cases add up over time), but that's not required for this change.

So, the basic idea would be to use Unicode strings and their (optional)
internal representation as Py_UNICODE[] instead of making Py_UNICODE[] a
first class data type. And then go from there and optimise certain things
to use the unpacked array directly, so that users won't need to put
explicit C-API calls into their code.

Stefan

___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] About IndexNode and unicode[index]

2013-03-03 Thread Zaur Shibzukhov
2013/3/2 Stefan Behnel :
> Stefan Behnel, 28.02.2013 22:16:
>
> https://github.com/scoder/cython/commit/cc4f7daec3b1f19b5acaed7766e2b6f86902ad94
>
> Stefan
>
I tried to build with that change. Tests `unicode_indexing` and
`index` are passed.
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Py_UNICODE* string support

2013-03-03 Thread Nikita Nemkin
On Sun, 03 Mar 2013 15:32:36 +0600, Stefan Behnel   
wrote:


1) I would like to get rid of UnicodeConst. A Py_UNICODE* is not  
different

from any other C array, except that it can coerce to and from Unicode
strings. So the representation of a literal should be a (properly
reference
counted) Python Unicode object, and users would be allowed to cast them
to , just as we support it for  and bytes.


I understand the idea. Since Python unicode literals are implicitly
coercible to Py_UNICODE*, there appears to be no need for C-level
Py_UNICODE[] literals. Indeed, client code will look exactly (!) the same
whether they are supported or not.

Except when it comes to nogil. (For example, native callbacks are almost
guaranteed to be nogil.) Hiding Python operations in what appears to be
pure C-level code will break users' assumptions.
This is #1 reason why I went for C-level literals. #2 reason is efficiency
on Py3.3. C-level literals don't need conversions and don't call any  
conversion APIs.



2) non-BMP literals should be supported by representing them as normal
Unicode strings and creating the Py_UNICODE representation at need (i.e.
explicitly through a cast, at runtime). Py_UNICODE[] literals are simply
not portable.


Py_UNICODE[] literals can be made fully portable if non-BMP ones are  
wrapped

like this:

   #ifdef Py_UNICODE_WIDE
   static const k_xxx[] = { , 0 };
   #else
   static const k_xxx[] = { , 0 };
   #endif

Literals containing only BMP chars are already portable and don't need
this wrapping.

3) __Pyx_Py_UNICODE_strlen() is ok, but only for the special case that  
all we have is a Py_UNICODE*. As long as we are dealing with Unicode  
string

objects, that won't be needed, so len() should be constant time in the
normal case instead of linear time.


len(Py_UNICODE*) simply mirrors len(char*). Its putpose is to provide
platform-independent Py_UNICODE_strlen (which is Py3 only and deprecated  
in 3.3).



So, the basic idea would be to use Unicode strings and their (optional)
internal representation as Py_UNICODE[] instead of making Py_UNICODE[] a
first class data type. And then go from there and optimise certain things
to use the unpacked array directly, so that users won't need to put
explicit C-API calls into their code.


Please reconsider your decision wrt C-level literals.
I believe that nogil code and a bit of efficiency (on 3.3) justify their
existence. (char* literals do have C-level literals, Py_UNICODE* is in
the same basket when it comes to Windows code).
The code to support them is also small and well-contained.
I've updated my pull request to fully support for non-BMP Py_UNICODE[]  
literals.


If you are still not convinced, so be it, I'll drop C-level literal  
support.



Best regards,
Nikita Nemkin


PS. I made a false claim in the previous mail. (Some of) Python's wchar_t  
APIs

do exist in Py2. But they won't manage the memory automatically anyway.
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] To Add datetime.pxd to cython.cpython

2013-03-03 Thread Zaur Shibzukhov
2013/3/2 Stefan Behnel :
> Hi,
>
> the last pull request looks good to me now.
>
> https://github.com/cython/cython/pull/189
>
> Any more comments on it?
>
> Stefan
>

As was suggested earlier, I added `import_datetime` inline function to
initialize PyDateTime C API instead of direct usage of "non-native" C
macros from datetime.h.
Now you call `import_array ()` first in the same way as is done with `numpy`.
 This approach looks natural in the light of experience with numpy.


Zaur Shibzukhov
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] To Add datetime.pxd to cython.cpython

2013-03-03 Thread Zaur Shibzukhov
2013/3/3 Zaur Shibzukhov :
> 2013/3/2 Stefan Behnel :
>> Hi,
>>
>> the last pull request looks good to me now.
>>
>> https://github.com/cython/cython/pull/189
>>
>> Any more comments on it?
>
> As was suggested earlier, I added `import_datetime` inline function to
> initialize PyDateTime C API instead of direct usage of "non-native" C
> macros from datetime.h.
> Now you call `import_array ()` first in the same way as is done with `numpy`.
>  This approach looks natural in the light of experience with numpy.
>
 I make some performance comparisons. Here example for dates.

# test_date.pyx


Here test code:

from cpython.datetime cimport import_datetime, date_new, date

import_datetime()

from datetime import date as pydate

def test_date1():
cdef list lst = []
for year in range(1000, 2001):
for month in range(1,13):
for day in range(1, 20):
d = pydate(year, month, day)
lst.append(d)
return lst


def test_date2():
cdef list lst = []
for year in range(1000, 2001):
for month in range(1,13):
for day in range(1, 20):
d = date(year, month, day)
lst.append(d)
return lst

def test_date3():
cdef list lst = []
cdef int year, month, day
for year in range(1000, 2001):
for month in range(1,13):
for day in range(1, 20):
d = date_new(year, month, day)
lst.append(d)
return lst

def test1():
l = test_date1()
return l

def test2():
l = test_date2()
return l

def test3():
l = test_date3()
return l

Here are timings:

(py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from
mytests.test_date import test1" "test1()"
50 loops, best of 5: 83.2 msec per loop
(py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from
mytests.test_date import test2" "test2()"
50 loops, best of 5: 74.7 msec per loop
(py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from
mytests.test_date import test3" "test3()"
50 loops, best of 5: 20.9 msec per loop

OSX 10.6.8 64 bit python 3.2

Shibzukhov Zaur
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] To Add datetime.pxd to cython.cpython

2013-03-03 Thread Zaur Shibzukhov
2013/3/3 Zaur Shibzukhov :
> 2013/3/3 Zaur Shibzukhov :
>> 2013/3/2 Stefan Behnel :
>>> Hi,
>>>
>>> the last pull request looks good to me now.
>>>
>>> https://github.com/cython/cython/pull/189
>>>
>>> Any more comments on it?
>>
>> As was suggested earlier, I added `import_datetime` inline function to
>> initialize PyDateTime C API instead of direct usage of "non-native" C
>> macros from datetime.h.
>> Now you call `import_array ()` first in the same way as is done with `numpy`.
>>  This approach looks natural in the light of experience with numpy.
>>
>  I make some performance comparisons. Here example for dates.
>
> # test_date.pyx
> 
>
> Here test code:
>
> from cpython.datetime cimport import_datetime, date_new, date
>
> import_datetime()
>
> from datetime import date as pydate
>
> def test_date1():
> cdef list lst = []
> for year in range(1000, 2001):
> for month in range(1,13):
> for day in range(1, 20):
> d = pydate(year, month, day)
> lst.append(d)
> return lst
>
>
> def test_date2():
> cdef list lst = []
> for year in range(1000, 2001):
> for month in range(1,13):
> for day in range(1, 20):
> d = date(year, month, day)
> lst.append(d)
> return lst
>
> def test_date3():
> cdef list lst = []
> cdef int year, month, day
> for year in range(1000, 2001):
> for month in range(1,13):
> for day in range(1, 20):
> d = date_new(year, month, day)
> lst.append(d)
> return lst
>
> def test1():
> l = test_date1()
> return l
>
> def test2():
> l = test_date2()
> return l
>
> def test3():
> l = test_date3()
> return l
>
> Here are timings:
>
> (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from
> mytests.test_date import test1" "test1()"
> 50 loops, best of 5: 83.2 msec per loop
> (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from
> mytests.test_date import test2" "test2()"
> 50 loops, best of 5: 74.7 msec per loop
> (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from
> mytests.test_date import test3" "test3()"
> 50 loops, best of 5: 20.9 msec per loop
>
> OSX 10.6.8 64 bit python 3.2
>

More acurate test...

# coding: utf-8

from cpython.datetime cimport import_datetime, date_new, date

import_datetime()

from datetime import date as pydate

def test_date1():
cdef list lst = []
cdef int year, month, day
for year in range(1000, 2001):
for month in range(1,13):
for day in range(1, 20):
d = pydate(year, month, day)
lst.append(d)
return lst


def test_date2():
cdef list lst = []
cdef int year, month, day
for year in range(1000, 2001):
for month in range(1,13):
for day in range(1, 20):
d = date(year, month, day)
lst.append(d)
return lst

def test_date3():
cdef list lst = []
cdef int year, month, day
for year in range(1000, 2001):
for month in range(1,13):
for day in range(1, 20):
d = date_new(year, month, day)
lst.append(d)
return lst

def test1():
l = test_date1()
return l

def test2():
l = test_date2()
return l

def test3():
l = test_date3()
return l

Timings:

(py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from
mytests.test_date import test1" "test1()"
50 loops, best of 5: 83.3 msec per loop
(py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from
mytests.test_date import test2" "test2()"
50 loops, best of 5: 74.6 msec per loop
(py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from
mytests.test_date import test3" "test3()"
50 loops, best of 5: 20.8 msec per loop

Shibzukhov Zaur
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Py_UNICODE* string support

2013-03-03 Thread Stefan Behnel
Nikita Nemkin, 03.03.2013 14:40:
> Please reconsider your decision wrt C-level literals.
> I believe that nogil code and a bit of efficiency (on 3.3) justify their
> existence. (char* literals do have C-level literals, Py_UNICODE* is in
> the same basket when it comes to Windows code).
> The code to support them is also small and well-contained.
> I've updated my pull request to fully support for non-BMP Py_UNICODE[]
> literals.

Ok, I think it's ok now. I can accept the special casing of Py_UNICODE
literals, it actually adds a value.

As one little nit-pick, may I ask you to rename the new name references to
"unicode" into "py_unicode" in your code? For example, "is_unicode",
"get_unicode_const", "unicode_const_index", etc. Given that Py_UNICODE is
no longer the native equivalent of Python's unicode type in Py3.3, I'd like
to avoid confusion in the code. The name "unicode" is much more likely to
refer to the builtin Python type than to a native C type when it appears in
Cython's sources.

Stefan

___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Py_UNICODE* string support

2013-03-03 Thread Stefan Behnel
Stefan Behnel, 03.03.2013 20:41:
> Nikita Nemkin, 03.03.2013 14:40:
>> Please reconsider your decision wrt C-level literals.
>> I believe that nogil code and a bit of efficiency (on 3.3) justify their
>> existence. (char* literals do have C-level literals, Py_UNICODE* is in
>> the same basket when it comes to Windows code).
>> The code to support them is also small and well-contained.
>> I've updated my pull request to fully support for non-BMP Py_UNICODE[]
>> literals.
> 
> Ok, I think it's ok now. I can accept the special casing of Py_UNICODE
> literals, it actually adds a value.
> 
> As one little nit-pick, may I ask you to rename the new name references to
> "unicode" into "py_unicode" in your code? For example, "is_unicode",
> "get_unicode_const", "unicode_const_index", etc. Given that Py_UNICODE is
> no longer the native equivalent of Python's unicode type in Py3.3, I'd like
> to avoid confusion in the code. The name "unicode" is much more likely to
> refer to the builtin Python type than to a native C type when it appears in
> Cython's sources.

Oh, and yet another thing: could you write up some documentation for this
in docs/src/tutorial/strings.rst ? Basically a Windows/wchar_t related
section, that also warns about the inefficiency in Py3.3, so that users
don't accidentally assume it's efficient for anything that needs to be
portable.

Stefan

___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] To Add datetime.pxd to cython.cpython

2013-03-03 Thread Zaur Shibzukhov
2013/3/3 Zaur Shibzukhov :
> 2013/3/3 Zaur Shibzukhov :
>> 2013/3/3 Zaur Shibzukhov :
>>> 2013/3/2 Stefan Behnel :
 Hi,

 the last pull request looks good to me now.

 https://github.com/cython/cython/pull/189

 Any more comments on it?
>>>
>>> As was suggested earlier, I added `import_datetime` inline function to
>>> initialize PyDateTime C API instead of direct usage of "non-native" C
>>> macros from datetime.h.
>>> Now you call `import_array ()` first in the same way as is done with 
>>> `numpy`.
>>>  This approach looks natural in the light of experience with numpy.
>>>
>>  I make some performance comparisons. Here example for dates.
>>
>> # test_date.pyx
>> 
>>
>> Here test code:
>>
>> from cpython.datetime cimport import_datetime, date_new, date
>>
>> import_datetime()
>>
>> from datetime import date as pydate
>>
>> def test_date1():
>> cdef list lst = []
>> for year in range(1000, 2001):
>> for month in range(1,13):
>> for day in range(1, 20):
>> d = pydate(year, month, day)
>> lst.append(d)
>> return lst
>>
>>
>> def test_date2():
>> cdef list lst = []
>> for year in range(1000, 2001):
>> for month in range(1,13):
>> for day in range(1, 20):
>> d = date(year, month, day)
>> lst.append(d)
>> return lst
>>
>> def test_date3():
>> cdef list lst = []
>> cdef int year, month, day
>> for year in range(1000, 2001):
>> for month in range(1,13):
>> for day in range(1, 20):
>> d = date_new(year, month, day)
>> lst.append(d)
>> return lst
>>
>> def test1():
>> l = test_date1()
>> return l
>>
>> def test2():
>> l = test_date2()
>> return l
>>
>> def test3():
>> l = test_date3()
>> return l
>>
>> Here are timings:
>>
>> (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from
>> mytests.test_date import test1" "test1()"
>> 50 loops, best of 5: 83.2 msec per loop
>> (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from
>> mytests.test_date import test2" "test2()"
>> 50 loops, best of 5: 74.7 msec per loop
>> (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from
>> mytests.test_date import test3" "test3()"
>> 50 loops, best of 5: 20.9 msec per loop
>>
>> OSX 10.6.8 64 bit python 3.2
>>
>
> More acurate test...
>
> # coding: utf-8
>
> from cpython.datetime cimport import_datetime, date_new, date
>
> import_datetime()
>
> from datetime import date as pydate
>
> def test_date1():
> cdef list lst = []
> cdef int year, month, day
> for year in range(1000, 2001):
> for month in range(1,13):
> for day in range(1, 20):
> d = pydate(year, month, day)
> lst.append(d)
> return lst
>
>
> def test_date2():
> cdef list lst = []
> cdef int year, month, day
> for year in range(1000, 2001):
> for month in range(1,13):
> for day in range(1, 20):
> d = date(year, month, day)
> lst.append(d)
> return lst
>
> def test_date3():
> cdef list lst = []
> cdef int year, month, day
> for year in range(1000, 2001):
> for month in range(1,13):
> for day in range(1, 20):
> d = date_new(year, month, day)
> lst.append(d)
> return lst
>
> def test1():
> l = test_date1()
> return l
>
> def test2():
> l = test_date2()
> return l
>
> def test3():
> l = test_date3()
> return l
>
> Timings:
>
> (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from
> mytests.test_date import test1" "test1()"
> 50 loops, best of 5: 83.3 msec per loop
> (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from
> mytests.test_date import test2" "test2()"
> 50 loops, best of 5: 74.6 msec per loop
> (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from
> mytests.test_date import test3" "test3()"
> 50 loops, best of 5: 20.8 msec per loop

Yet another performance comparison for `time`:

# coding: utf-8

from cpython.datetime cimport import_datetime, time_new, time

import_datetime()

from datetime import time as pytime

def test_time1():
cdef list lst = []
cdef int hour, minute, second, microsecond
for hour in range(0, 24):
for minute in range(0,60):
for second in range(0, 60):
for microsecond in range(0, 10, 5):
d = pytime(hour, minute, second, microsecond)
lst.append(d)
return lst


def test_time2():
cdef list lst = []
cdef int hour, minute, second, microsecond
for hour in range(0, 24):
for minute in range(0,60):
for second in range(0, 60):
for microsecond in range(0, 10, 5):
d = time(hour, minute, second, microsecond)
lst.append(d)
return lst

def test_time3():
cd