[Cython] Recent bugs in generators
Hi! 1. Lambda-generator: Previous implementation was inspired by Python2.6: >>> list((lambda:((yield 1), (yield 2)))()) [1, 2, (None, None)] things changed in python2.7: >>> list((lambda:((yield 1), (yield 2)))()) [1, 2] 2. GeneratorExit is initialized to StopIteration when running generators_py doctests This is strange behaviour of doctest module, try this: x = __builtins__ def foo(): """ >>> type(x) """ 3. check_yield_in_exception() Cython calls __Pyx_ExceptionReset when except block is done, so when yield is there no exception reset is called. I'm not sure how to fix this. import sys def foo(): """ >>> list(foo()) [, None] """ try: raise ValueError except ValueError: yield sys.exc_info()[0] yield sys.exc_info()[0] # exc_info is lost here Here is quick fix for 1 and 2 https://github.com/cython/cython/pull/25 -- vitja. ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] GSoC Proposal - Reimplement C modules in CPython's standard library in Cython.
2011/4/15 Stefan Behnel > [please avoid top-posting] > > Arthur de Souza Ribeiro, 15.04.2011 04:31: > > I've created the .pyx files and it passed in all python tests. >> > > Fine. > > As far as I can see, you only added static types in some places. Did you test if they are actually required (maybe using "cython -a")? Some > of them look rather counterproductive and should lead to a major slow-down. In fact, I didn't, but, after you told me to do that, I run cython -a and removed some unnecessary types. > I added comments to your initial commit. > Hi Stefan, about your first comment : "And it's better to let Cython know that this name refers to a function." in line 69 of encoder.pyx file I didn't understand well what does that mean, can you explain more this comment? About the other comments, I think I solved them all, any problem with them or other ones, please tell me. I'll try to fix. > Note that it's not obvious from your initial commit what you actually > changed. It would have been better to import the original file first, rename > it to .pyx, and then commit your changes. > I created a directory named 'Diff files' where I put the files generated by 'diff' command that i run in my computer, if you think it still be better if I commit and then change, there is no problem for me... > > It appears that you accidentally added your .c and .so files to your repo. > > > https://github.com/arthursribeiro/JSON-module > > Removed them. > > To test them, as I said, I copied the .py test files to my project >> directory, generated the .so files, import them instead of python modules >> and run. I run every test file and it passed in all of them. To run the >> tests, run the file 'run-tests.sh' >> >> I used just .pyx in this module, should I reimplement it using pxd with >> the >> normal .py? >> > > Not at this point. I think it's more important to get some performance > numbers to see how your module behaves compared to the C accelerator module > (_json.c). I think the best approach to this project would actually be to > start with profiling the Python implementation to see where performance > problems occur (or to look through _json.c to see what the CPython > developers considered performance critical), and then put the focus on > trying to speed up only those parts of the Python implementation, by adding > static types and potentially even rewriting them in a way that Cython can > optimise them better. > I've profilled the module I created and the module that is in Python 3.2, the result is that the cython module spent about 73% less time then python's one, the output to the module was like this (blue for cython, red for python): The behavior between my module and python's one seems to be the same I think that's the way it should be. JSONModule nested_dict 10004 function calls in 0.268 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 10.1960.0000.1960.000 :0(dumps) 10.0000.0000.2680.268 :0(exec) 10.0000.0000.0000.000 :0(setprofile) 10.0720.0720.2680.268 :1() 10.0000.0000.2680.268 profile:0(for ii in range(1): fun(thing)) 00.000 0.000 profile:0(profiler) json nested_dict 60004 function calls in 1.016 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 10.0000.0001.0161.016 :0(exec) 20.1360.0000.1360.000 :0(isinstance) 10.1200.0000.1200.000 :0(join) 10.0000.0000.0000.000 :0(setprofile) 10.0880.0881.0161.016 :1() 10.1360.0000.9280.000 __init__.py:180(dumps) 10.3080.0000.7920.000 encoder.py:172(encode) 10.2280.0000.2280.000 encoder.py:193(iterencode) 10.0000.0001.0161.016 profile:0(for ii in range(1): fun(thing)) 00.000 0.000 profile:0(profiler) JSONModule ustring 10004 function calls in 0.140 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 10.0720.0000.0720.000 :0(dumps) 10.0000.0000.1400.140 :0(exec) 10.0000.0000.0000.000 :0(setprofile) 10.0680.0680.1400.140 :1() 10.0000.0000.1400.140 profile:0(for ii in range(1): fun(thing)) 00.000 0.000 profile:0(profiler) json ustring 40004 function calls in 0.580 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 10.0920.0000.0920.000 :0(encode_basestring_ascii) 10.0040.004
Re: [Cython] GSoC Proposal - Reimplement C modules in CPython's standard library in Cython.
Den 17.04.2011 20:07, skrev Arthur de Souza Ribeiro: I've profilled the module I created and the module that is in Python 3.2, the result is that the cython module spent about 73% less time then python's one, the output to the module was like this (blue for cython, red for python): The number of function calls are different. For nested_dict, you have 37320 calls per second for Cython and 59059 calls per second for Python. I am not convinced that is better. Sturla ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] GSoC Proposal - Reimplement C modules in CPython's standard library in Cython.
Sturla Molden, 17.04.2011 20:24: Den 17.04.2011 20:07, skrev Arthur de Souza Ribeiro: I've profilled the module I created and the module that is in Python 3.2, the result is that the cython module spent about 73% less time then python's one, the output to the module was like this (blue for cython, red for python): The number of function calls are different. For nested_dict, you have 37320 calls per second for Cython and 59059 calls per second for Python. I am not convinced that is better. Note that there are 2 calls to isinstance(), which Cython handles internally. The profiler cannot see those. However, the different number of functions calls also makes the profiling results less comparable, since there are fewer calls into the profiler. This leads to a lower performance penalty for Cython in the absolute timings, and consequently to an unfair comparison. Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] GSoC Proposal - Reimplement C modules in CPython's standard library in Cython.
Den 17.04.2011 21:16, skrev Stefan Behnel: However, the different number of functions calls also makes the profiling results less comparable, since there are fewer calls into the profiler. This leads to a lower performance penalty for Cython in the absolute timings, and consequently to an unfair comparison. As I understand it, the profiler will give a profile of a module. To measure absolute performance, one should use timeit or just time.clock. Sturla ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] GSoC Proposal - Reimplement C modules in CPython's standard library in Cython.
Arthur de Souza Ribeiro, 17.04.2011 20:07: Hi Stefan, about your first comment : "And it's better to let Cython know that this name refers to a function." in line 69 of encoder.pyx file I didn't understand well what does that mean, can you explain more this comment? Hmm, sorry, I think that was not so important. That code line is only used to override the Python implementation with the implementation from the external C accelerator module. To do that, it assigns either of the two functions to a name. So, when that name is called in the code, Cython cannot know that it actually is a function, and has to resort to Python calling, whereas a visible c(p)def function that is defined inside of the same module could be called faster. I missed the fact that this name isn't really used inside of the module, so whether Cython knows that it's a function or not isn't really all that important. I added another comment to this commit, though: https://github.com/arthursribeiro/JSON-module/commit/e2d80e0aeab6d39ff2d9b847843423ebdb9c57b7#diff-4 About the other comments, I think I solved them all, any problem with them or other ones, please tell me. I'll try to fix. It looks like you fixed a good deal of them. I actually tried to work with your code, but I'm not sure how you are building it. Could you give me a hint on that? Where did you actually take the original code from? Python 3.2? Or from Python's hg branch? Note that it's not obvious from your initial commit what you actually changed. It would have been better to import the original file first, rename it to .pyx, and then commit your changes. I created a directory named 'Diff files' where I put the files generated by 'diff' command that i run in my computer, if you think it still be better if I commit and then change, there is no problem for me... Diff only gives you the final outcome. Committing on top of the original files has the advantage of making the incremental changes visible separately. That makes it clearer what you tried, and a good commit comment will then make it clear why you did it. I think it's more important to get some performance numbers to see how your module behaves compared to the C accelerator module (_json.c). I think the best approach to this project would actually be to start with profiling the Python implementation to see where performance problems occur (or to look through _json.c to see what the CPython developers considered performance critical), and then put the focus on trying to speed up only those parts of the Python implementation, by adding static types and potentially even rewriting them in a way that Cython can optimise them better. I've profilled the module I created and the module that is in Python 3.2, the result is that the cython module spent about 73% less time then python's That's a common mistake when profiling: the actual time it takes to run is not meaningful. Depending on how far the two profiled programs differ, they may interact with the profiler in more or less intensive ways (as is clearly the case here), so the total time it takes for the programs to run can differ quite heavily under profiling, even if the non-profiled programs run at exactly the same speed. Also, I don't think you have enabled profiling for the Cython code. You can do that by passing the "profile=True" directive to the compiler, or by putting it at the top of the source files. That will add module-inner function calls to the profiling output. Note, however, that enabling profiling will slow down the execution, so disable it when you measure absolute run times. http://docs.cython.org/src/tutorial/profiling_tutorial.html (blue for cython, red for python): Colours tend to pass rather badly through mailing lists. Many people disable the HTML presentation of e-mails, and plain text does not have colours. But it was still obvious enough what you meant. The behavior between my module and python's one seems to be the same I think that's the way it should be. JSONModule nested_dict 10004 function calls in 0.268 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 10.1960.0000.1960.000 :0(dumps) This is a pretty short list (I stripped the uninteresting parts). The profile right below shows a lot more entries in encoder.py. It would be good to see these calls in the Cython code as well. json nested_dict 60004 function calls in 1.016 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 10.0000.0001.0161.016 :0(exec) 20.1360.0000.1360.000 :0(isinstance) 10.1200.0000.1200.000 :0(join) 10.0000.0000.0000.000 :0(setprofile) 10.0880.0881.0161.016:1() 10.1360.0000.9280.000 _
Re: [Cython] Recent bugs in generators
Vitja Makarov, 17.04.2011 17:57: 3. check_yield_in_exception() I added this because I found a failing pyregr test that uses it (testing the @contextmanager decorator). Cython calls __Pyx_ExceptionReset when except block is done, so when yield is there no exception reset is called. I'm not sure how to fix this. I'm not completely sure either. import sys def foo(): """ >>> list(foo()) [, None] """ try: raise ValueError except ValueError: yield sys.exc_info()[0] yield sys.exc_info()[0] # exc_info is lost here I think (!), the difference here is that CPython actually keeps the exception in the generator frame. We don't have a frame, so we have to emulate it using the closure class. I guess we'll have to store away the exception into the closure when we yield while an exception is being handled, and restore it afterwards. Note: this is not the exception that is freshly *being* raised (the "_cur*" fields in the thread state), it's the exception that *was* raised and is now being handled, i.e. the thread state fields without the "_cur", that are reflected by sys.exc_info(). Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Recent bugs in generators
2011/4/18 Stefan Behnel : > Vitja Makarov, 17.04.2011 17:57: >> >> 3. check_yield_in_exception() > > I added this because I found a failing pyregr test that uses it (testing the > @contextmanager decorator). > > >> Cython calls __Pyx_ExceptionReset when except block is done, so when >> yield is there no exception reset is called. >> >> I'm not sure how to fix this. > > I'm not completely sure either. > > >> import sys >> >> def foo(): >> """ >> >>> list(foo()) >> [, None] >> """ >> try: >> raise ValueError >> except ValueError: >> yield sys.exc_info()[0] >> yield sys.exc_info()[0] # exc_info is lost here > > I think (!), the difference here is that CPython actually keeps the > exception in the generator frame. We don't have a frame, so we have to > emulate it using the closure class. I guess we'll have to store away the > exception into the closure when we yield while an exception is being > handled, and restore it afterwards. Note: this is not the exception that is > freshly *being* raised (the "_cur*" fields in the thread state), it's the > exception that *was* raised and is now being handled, i.e. the thread state > fields without the "_cur", that are reflected by sys.exc_info(). > Interesting difference between py2 and py3: def foo(): try: raise ValueError except ValueError: yield raise list(foo()) File "xxx.py", line 7, in list(foo()) File "xxx.py", line 6, in foo raise TypeError: exceptions must be old-style classes or derived from BaseException, not NoneType It seems that exception info is completely lost (tried 2.6, 2.7) and seems to be fixed in python3. Btw exception info temps are already saved and restored between yields. -- vitja. ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Recent bugs in generators
Vitja Makarov, 18.04.2011 06:38: 2011/4/18 Stefan Behnel: Vitja Makarov, 17.04.2011 17:57: 3. check_yield_in_exception() I added this because I found a failing pyregr test that uses it (testing the @contextmanager decorator). Cython calls __Pyx_ExceptionReset when except block is done, so when yield is there no exception reset is called. I'm not sure how to fix this. I'm not completely sure either. import sys def foo(): """ >>>list(foo()) [, None] """ try: raise ValueError except ValueError: yield sys.exc_info()[0] yield sys.exc_info()[0] # exc_info is lost here I think (!), the difference here is that CPython actually keeps the exception in the generator frame. We don't have a frame, so we have to emulate it using the closure class. I guess we'll have to store away the exception into the closure when we yield while an exception is being handled, and restore it afterwards. Note: this is not the exception that is freshly *being* raised (the "_cur*" fields in the thread state), it's the exception that *was* raised and is now being handled, i.e. the thread state fields without the "_cur", that are reflected by sys.exc_info(). Interesting difference between py2 and py3: def foo(): try: raise ValueError except ValueError: yield raise list(foo()) File "xxx.py", line 7, in list(foo()) File "xxx.py", line 6, in foo raise TypeError: exceptions must be old-style classes or derived from BaseException, not NoneType It seems that exception info is completely lost (tried 2.6, 2.7) and seems to be fixed in python3. Not surprising. The implementation is completely different in Py2 and Py3, both in CPython and in Cython. It's actually much simpler in Cython under Py3, due to better semantics and C-API support. That also implies that there's much less Cython can do wrong in that environment. ;-) Btw exception info temps are already saved and restored between yields. Right, but the exc_info itself is not reset and recovered around the yield. As I said above, generators have their own lifetime frame in CPython, and exceptions don't leak from that. So, whenever it's the generator (or code called by it) that raises an exception, that must be kept local to the generator. Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel