2011/12/27 Stefan Behnel <stefan...@behnel.de>: > Vitja Makarov, 26.12.2011 20:07: >> >> 2011/12/25 Stefan Behnel: >> >>> Stefan Behnel, 21.12.2011 21:17: >>>> >>>> Vitja Makarov, 21.12.2011 19:48: >>>>> >>>>> >>>>> Some time ago we were talking about generators optimization by copying >>>>> local variables from closure into local scope. >>>> >>>> >>>> Yes, I think that will make it easier for the C compiler to make >>>> optimistic assumptions about external values. >>>> >>>>> Now I think that should be a good idea to implement this for both >>>>> generators and regular closure functions. So local var will be used >>>>> for reference and assignment should be made to both copies. Of course >>>>> there are some variables that shouldn't be copied: non-local vars, >>>>> arrays, C++ classes and structures. >>>> >>>> >>>> >>>> Basically, anything that external code can modify. That makes it a bit >>>> tricky to do it also for 'normal' closure functions - the whole idea is >>>> that there is more than one function that can refer to a variable. >>>> >>>>> Also it may be a good idea to move outer scope pointer into local >>>>> variable. >>>>> >>>>> So I'm wondering what is a good test to measure actual speedup? >>>> >>>> >>>> >>>> http://blog.behnel.de/index.php?p=163 >>>> >>>> Just take the plain Python versions of the iterparse functions and >>>> compare >>>> them before and after the change. The raw C implementation in CPython >>>> gives a good baseline. >>>> >>>> Actually, it would be generally interesting to run the Cython versions >>>> through callgrind to see where the time is actually being spent. >>> >>> >>> Another useful test is the "nqueens" benchmark from the CPython test >>> suite. >>> It's regularly run on Jenkins in Py2.7 and 3.3. >>> >>> https://sage.math.washington.edu:8091/hudson/view/bench/ >>> >>> Note that it mostly uses generator expressions, which could easily >>> benefit >>> from a couple of further optimisations by specialising them, e.g. by >>> providing a length hint. >>> >>> http://trac.cython.org/cython_trac/ticket/756 >>> >> >> I have implemented local variable copying, it could be found here: >> >> https://github.com/vitek/cython/tree/_copy_closure > > > Cool. > > > >> I didn't noticed significant speedup running nqueens test. Indeed I'm >> not sure it's speedup. >> It's all about<+-2%. Perhaps some better test is required. > > > Well, yes, I'd guess that generator expressions are really not the target of > this optimisation. They simply don't do enough in between two iterations. > > > >> Btw, I got ~8% speedup for this dummy test: >> def foo(): >> cdef int i, r >> cdef list o >> o = [] >> def bar(): >> return len(o) >> for i in range(10000000): >> bar() >> r += len(o) >> return r > > > Any speedup will help someone, I guess. And it's good to know that it makes > a difference if the closure code isn't plain trivial. > > >> What's iterparse()? > > > Ah, sorry, my bad. Too much lxml stuff lately, I guess. I really meant > "itertools". See the blog link above. > > Also, consider using callgrind for performance analysis. For one, it'll > print the number of executed instructions after a test run. For Cython > generated ode, it's usually a good sign if that decreases after a change. > For a deeper analysis of such a profiling run, you can then use kcachegrind. > >
I tried count() generator from your blog: optimized version is about 1% slower. I think this optimization would make sens for regular closures and generators that do some real work inside. -- vitja. _______________________________________________ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel