[Cython] Recent bugs in generators

2011-04-17 Thread Vitja Makarov
Hi!

1. Lambda-generator:

Previous implementation was inspired by Python2.6:
>>> list((lambda:((yield 1), (yield 2)))())
[1, 2, (None, None)]

things changed in python2.7:
>>> list((lambda:((yield 1), (yield 2)))())
[1, 2]

2. GeneratorExit is initialized to StopIteration when running
generators_py doctests

This is strange behaviour of doctest module, try this:
x = __builtins__
def foo():
"""
>>> type(x)

"""

3. check_yield_in_exception()

Cython calls __Pyx_ExceptionReset when except block is done, so when
yield is there no exception reset is called.

I'm not sure how to fix this.

import sys

def foo():
"""
>>> list(foo())
[, None]
"""
try:
raise ValueError
except ValueError:
yield sys.exc_info()[0]
yield sys.exc_info()[0] # exc_info is lost here


Here is quick fix for 1 and 2 https://github.com/cython/cython/pull/25

-- 
vitja.
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] GSoC Proposal - Reimplement C modules in CPython's standard library in Cython.

2011-04-17 Thread Arthur de Souza Ribeiro
2011/4/15 Stefan Behnel 

> [please avoid top-posting]
>
> Arthur de Souza Ribeiro, 15.04.2011 04:31:
>
>  I've created the .pyx files and it passed in all python tests.
>>
>
> Fine.
>
> As far as I can see, you only added static types in some places.

Did you test if they are actually required (maybe using "cython -a")? Some
> of them look rather counterproductive and should lead to a major slow-down.


In fact, I didn't, but, after you told me to do that, I run cython -a and
removed some unnecessary types.


> I added comments to your initial commit.
>

Hi Stefan, about your first comment : "And it's better to let Cython know
that this name refers to a function."  in line 69 of encoder.pyx file I
didn't understand well what does that mean, can you explain more this
comment?

About the other comments, I think I solved them all, any problem with them
or other ones, please tell me. I'll try to fix.


> Note that it's not obvious from your initial commit what you actually
> changed. It would have been better to import the original file first, rename
> it to .pyx, and then commit your changes.
>

I created a directory named 'Diff files' where I put the files generated by
'diff' command that i run in my computer, if you think it still be better if
I commit and then change, there is no problem for me...


>
> It appears that you accidentally added your .c and .so files to your repo.
>
>
> https://github.com/arthursribeiro/JSON-module
>
>
Removed them.


>
>  To test them, as I said, I copied the .py test files to my project
>> directory, generated the .so files, import them instead of python modules
>> and run. I run every test file and it passed in all of them. To run the
>> tests, run the file 'run-tests.sh'
>>
>> I used just .pyx in this module, should I reimplement it using pxd with
>> the
>> normal .py?
>>
>
> Not at this point. I think it's more important to get some performance
> numbers to see how your module behaves compared to the C accelerator module
> (_json.c). I think the best approach to this project would actually be to
> start with profiling the Python implementation to see where performance
> problems occur (or to look through _json.c to see what the CPython
> developers considered performance critical), and then put the focus on
> trying to speed up only those parts of the Python implementation, by adding
> static types and potentially even rewriting them in a way that Cython can
> optimise them better.
>

I've profilled the module I created and the module that is in Python 3.2,
the result is that the cython module spent about 73% less time then python's
one, the output to the module was like this (blue for cython, red for
python):

The behavior between my module and python's one seems to be the same I think
that's the way it should be.

JSONModule nested_dict
 10004 function calls in 0.268 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
10.1960.0000.1960.000 :0(dumps)
10.0000.0000.2680.268 :0(exec)
10.0000.0000.0000.000 :0(setprofile)
10.0720.0720.2680.268 :1()
10.0000.0000.2680.268 profile:0(for ii in
range(1):  fun(thing))
00.000 0.000  profile:0(profiler)


json nested_dict
 60004 function calls in 1.016 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
10.0000.0001.0161.016 :0(exec)
20.1360.0000.1360.000 :0(isinstance)
10.1200.0000.1200.000 :0(join)
10.0000.0000.0000.000 :0(setprofile)
10.0880.0881.0161.016 :1()
10.1360.0000.9280.000 __init__.py:180(dumps)
10.3080.0000.7920.000 encoder.py:172(encode)
10.2280.0000.2280.000 encoder.py:193(iterencode)
10.0000.0001.0161.016 profile:0(for ii in
range(1):  fun(thing))
00.000 0.000  profile:0(profiler)


JSONModule ustring
 10004 function calls in 0.140 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
10.0720.0000.0720.000 :0(dumps)
10.0000.0000.1400.140 :0(exec)
10.0000.0000.0000.000 :0(setprofile)
10.0680.0680.1400.140 :1()
10.0000.0000.1400.140 profile:0(for ii in
range(1):  fun(thing))
00.000 0.000  profile:0(profiler)


json ustring
 40004 function calls in 0.580 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
10.0920.0000.0920.000 :0(encode_basestring_ascii)
10.0040.004 

Re: [Cython] GSoC Proposal - Reimplement C modules in CPython's standard library in Cython.

2011-04-17 Thread Sturla Molden

Den 17.04.2011 20:07, skrev Arthur de Souza Ribeiro:


I've profilled the module I created and the module that is in Python 
3.2, the result is that the cython module spent about 73% less time 
then python's one, the output to the module was like this (blue for 
cython, red for python):





The number of function calls are different. For nested_dict, you have 
37320 calls per second for Cython and 59059 calls per second for Python. 
I am not convinced that is better.


Sturla
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] GSoC Proposal - Reimplement C modules in CPython's standard library in Cython.

2011-04-17 Thread Stefan Behnel

Sturla Molden, 17.04.2011 20:24:

Den 17.04.2011 20:07, skrev Arthur de Souza Ribeiro:

I've profilled the module I created and the module that is in Python 3.2,
the result is that the cython module spent about 73% less time then
python's one, the output to the module was like this (blue for cython,
red for python):


The number of function calls are different. For nested_dict, you have 37320
calls per second for Cython and 59059 calls per second for Python. I am not
convinced that is better.


Note that there are 2 calls to isinstance(), which Cython handles 
internally. The profiler cannot see those.


However, the different number of functions calls also makes the profiling 
results less comparable, since there are fewer calls into the profiler. 
This leads to a lower performance penalty for Cython in the absolute 
timings, and consequently to an unfair comparison.


Stefan
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] GSoC Proposal - Reimplement C modules in CPython's standard library in Cython.

2011-04-17 Thread Sturla Molden

Den 17.04.2011 21:16, skrev Stefan Behnel:


However, the different number of functions calls also makes the 
profiling results less comparable, since there are fewer calls into 
the profiler. This leads to a lower performance penalty for Cython in 
the absolute timings, and consequently to an unfair comparison.




As I understand it, the profiler will give a profile of a module.

To measure absolute performance, one should use timeit or just time.clock.

Sturla


___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] GSoC Proposal - Reimplement C modules in CPython's standard library in Cython.

2011-04-17 Thread Stefan Behnel

Arthur de Souza Ribeiro, 17.04.2011 20:07:

Hi Stefan, about your first comment : "And it's better to let Cython know
that this name refers to a function."  in line 69 of encoder.pyx file I
didn't understand well what does that mean, can you explain more this
comment?


Hmm, sorry, I think that was not so important. That code line is only used 
to override the Python implementation with the implementation from the 
external C accelerator module. To do that, it assigns either of the two 
functions to a name. So, when that name is called in the code, Cython 
cannot know that it actually is a function, and has to resort to Python 
calling, whereas a visible c(p)def function that is defined inside of the 
same module could be called faster.


I missed the fact that this name isn't really used inside of the module, so 
whether Cython knows that it's a function or not isn't really all that 
important.


I added another comment to this commit, though:

https://github.com/arthursribeiro/JSON-module/commit/e2d80e0aeab6d39ff2d9b847843423ebdb9c57b7#diff-4



About the other comments, I think I solved them all, any problem with them
or other ones, please tell me. I'll try to fix.


It looks like you fixed a good deal of them.

I actually tried to work with your code, but I'm not sure how you are 
building it. Could you give me a hint on that?


Where did you actually take the original code from? Python 3.2? Or from 
Python's hg branch?




Note that it's not obvious from your initial commit what you actually
changed. It would have been better to import the original file first, rename
it to .pyx, and then commit your changes.


I created a directory named 'Diff files' where I put the files generated by
'diff' command that i run in my computer, if you think it still be better if
I commit and then change, there is no problem for me...


Diff only gives you the final outcome. Committing on top of the original 
files has the advantage of making the incremental changes visible 
separately. That makes it clearer what you tried, and a good commit comment 
will then make it clear why you did it.




I think it's more important to get some performance
numbers to see how your module behaves compared to the C accelerator module
(_json.c). I think the best approach to this project would actually be to
start with profiling the Python implementation to see where performance
problems occur (or to look through _json.c to see what the CPython
developers considered performance critical), and then put the focus on
trying to speed up only those parts of the Python implementation, by adding
static types and potentially even rewriting them in a way that Cython can
optimise them better.


I've profilled the module I created and the module that is in Python 3.2,
the result is that the cython module spent about 73% less time then python's


That's a common mistake when profiling: the actual time it takes to run is 
not meaningful. Depending on how far the two profiled programs differ, they 
may interact with the profiler in more or less intensive ways (as is 
clearly the case here), so the total time it takes for the programs to run 
can differ quite heavily under profiling, even if the non-profiled programs 
run at exactly the same speed.


Also, I don't think you have enabled profiling for the Cython code. You can 
do that by passing the "profile=True" directive to the compiler, or by 
putting it at the top of the source files. That will add module-inner 
function calls to the profiling output. Note, however, that enabling 
profiling will slow down the execution, so disable it when you measure 
absolute run times.


http://docs.cython.org/src/tutorial/profiling_tutorial.html



(blue for cython, red for python):


Colours tend to pass rather badly through mailing lists. Many people 
disable the HTML presentation of e-mails, and plain text does not have 
colours. But it was still obvious enough what you meant.




The behavior between my module and python's one seems to be the same I think
that's the way it should be.

JSONModule nested_dict
  10004 function calls in 0.268 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 10.1960.0000.1960.000 :0(dumps)


This is a pretty short list (I stripped the uninteresting parts). The 
profile right below shows a lot more entries in encoder.py. It would be 
good to see these calls in the Cython code as well.




json nested_dict
  60004 function calls in 1.016 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 10.0000.0001.0161.016 :0(exec)
 20.1360.0000.1360.000 :0(isinstance)
 10.1200.0000.1200.000 :0(join)
 10.0000.0000.0000.000 :0(setprofile)
 10.0880.0881.0161.016:1()
 10.1360.0000.9280.000 _

Re: [Cython] Recent bugs in generators

2011-04-17 Thread Stefan Behnel

Vitja Makarov, 17.04.2011 17:57:

3. check_yield_in_exception()


I added this because I found a failing pyregr test that uses it (testing 
the @contextmanager decorator).




Cython calls __Pyx_ExceptionReset when except block is done, so when
yield is there no exception reset is called.

I'm not sure how to fix this.


I'm not completely sure either.



import sys

def foo():
 """
 >>>  list(foo())
 [, None]
 """
 try:
 raise ValueError
 except ValueError:
 yield sys.exc_info()[0]
 yield sys.exc_info()[0] # exc_info is lost here


I think (!), the difference here is that CPython actually keeps the 
exception in the generator frame. We don't have a frame, so we have to 
emulate it using the closure class. I guess we'll have to store away the 
exception into the closure when we yield while an exception is being 
handled, and restore it afterwards. Note: this is not the exception that is 
freshly *being* raised (the "_cur*" fields in the thread state), it's the 
exception that *was* raised and is now being handled, i.e. the thread state 
fields without the "_cur", that are reflected by sys.exc_info().


Stefan
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Recent bugs in generators

2011-04-17 Thread Vitja Makarov
2011/4/18 Stefan Behnel :
> Vitja Makarov, 17.04.2011 17:57:
>>
>> 3. check_yield_in_exception()
>
> I added this because I found a failing pyregr test that uses it (testing the
> @contextmanager decorator).
>
>
>> Cython calls __Pyx_ExceptionReset when except block is done, so when
>> yield is there no exception reset is called.
>>
>> I'm not sure how to fix this.
>
> I'm not completely sure either.
>
>
>> import sys
>>
>> def foo():
>>     """
>>     >>>  list(foo())
>>     [, None]
>>     """
>>     try:
>>         raise ValueError
>>     except ValueError:
>>         yield sys.exc_info()[0]
>>         yield sys.exc_info()[0] # exc_info is lost here
>
> I think (!), the difference here is that CPython actually keeps the
> exception in the generator frame. We don't have a frame, so we have to
> emulate it using the closure class. I guess we'll have to store away the
> exception into the closure when we yield while an exception is being
> handled, and restore it afterwards. Note: this is not the exception that is
> freshly *being* raised (the "_cur*" fields in the thread state), it's the
> exception that *was* raised and is now being handled, i.e. the thread state
> fields without the "_cur", that are reflected by sys.exc_info().
>

Interesting difference between py2 and py3:

def foo():
try:
raise ValueError
except ValueError:
yield
raise
list(foo())

  File "xxx.py", line 7, in 
list(foo())
  File "xxx.py", line 6, in foo
raise
TypeError: exceptions must be old-style classes or derived from
BaseException, not NoneType

It seems that exception info is completely lost (tried 2.6, 2.7) and
seems to be fixed in python3.

Btw exception info temps are already saved and restored between yields.


-- 
vitja.
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Recent bugs in generators

2011-04-17 Thread Stefan Behnel

Vitja Makarov, 18.04.2011 06:38:

2011/4/18 Stefan Behnel:

Vitja Makarov, 17.04.2011 17:57:


3. check_yield_in_exception()


I added this because I found a failing pyregr test that uses it (testing the
@contextmanager decorator).



Cython calls __Pyx_ExceptionReset when except block is done, so when
yield is there no exception reset is called.

I'm not sure how to fix this.


I'm not completely sure either.



import sys

def foo():
 """
 >>>list(foo())
 [, None]
 """
 try:
 raise ValueError
 except ValueError:
 yield sys.exc_info()[0]
 yield sys.exc_info()[0] # exc_info is lost here


I think (!), the difference here is that CPython actually keeps the
exception in the generator frame. We don't have a frame, so we have to
emulate it using the closure class. I guess we'll have to store away the
exception into the closure when we yield while an exception is being
handled, and restore it afterwards. Note: this is not the exception that is
freshly *being* raised (the "_cur*" fields in the thread state), it's the
exception that *was* raised and is now being handled, i.e. the thread state
fields without the "_cur", that are reflected by sys.exc_info().


Interesting difference between py2 and py3:

def foo():
 try:
 raise ValueError
 except ValueError:
 yield
 raise
list(foo())

   File "xxx.py", line 7, in
 list(foo())
   File "xxx.py", line 6, in foo
 raise
TypeError: exceptions must be old-style classes or derived from
BaseException, not NoneType

It seems that exception info is completely lost (tried 2.6, 2.7) and
seems to be fixed in python3.


Not surprising. The implementation is completely different in Py2 and Py3, 
both in CPython and in Cython. It's actually much simpler in Cython under 
Py3, due to better semantics and C-API support. That also implies that 
there's much less Cython can do wrong in that environment. ;-)




Btw exception info temps are already saved and restored between yields.


Right, but the exc_info itself is not reset and recovered around the yield. 
As I said above, generators have their own lifetime frame in CPython, and 
exceptions don't leak from that. So, whenever it's the generator (or code 
called by it) that raises an exception, that must be kept local to the 
generator.


Stefan
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel