2012/1/27 Stefan Behnel <stefan...@behnel.de>: > Robert Bradshaw, 21.01.2012 23:09: >> On Sat, Jan 21, 2012 at 10:50 AM, Stefan Behnel wrote: >>> I did some callgrind profiling on Cython's generators and was surprised to >>> find that AddTraceback() represents a serious performance penalty for short >>> running generators. >>> >>> I profiled a compiled Python implementation of itertools.groupby(), which >>> yields (key, group) tuples where the group is an iterator again. I ran this >>> code in Python for benchmarking: >>> >>> """ >>> L = sorted(range(1000)*5) >>> >>> all(list(g) for k,g in groupby(L)) >>> """ >>> >>> Groups tend to be rather short in real code, often just one or a couple of >>> items, so unpacking the group iterator into a list will usually be a quick >>> loop and then the generator raises StopIteration on termination and builds >>> a traceback for it. According to callgrind (which, I should note, tends to >>> overestimate the amount of time spent in memory allocation), the iteration >>> during the group unpacking takes about 30% of the overall runtime of the >>> all() loop, and the AddTraceback() call at the end of each group traversal >>> takes up to 25% (!) on my side. That means that more than 80% of the group >>> unpacking time goes into raising StopIteration from the generators. I >>> attached the call graph with the relative timings. >>> >>> About half of the exception raising time is eaten by PyString_FromFormat() >>> that builds the function-name + line-position string (which, I may note, is >>> basically a convenience feature). This string is a constant for a >>> generator's StopIteration exception, at least for each final return point >>> in a generator, but here it is being recreated over and over again, for >>> each exception that gets raised. >>> >>> Even if we keep creating a new frame instance each time (which should be ok >>> because CPython has a frame instance cache already and we'd only create one >>> during the generator lifetime), the whole code object could actually be >>> cached after the first creation, preferably bound to the lifetime of the >>> generator creator function/method. Or, more generally, one code object per >>> generator termination point, which will be a single point in the majority >>> of cases. For the specific code above, that should shave off almost 20% of >>> the overall runtime of the all() loop. >>> >>> I think that's totally worth doing. >> >> Makes sense to me. I did some caching like this for profiling. > > Here's a ticket for now: > > http://trac.cython.org/cython_trac/ticket/760 >
I think that could be easily fixed. CPython doesn't add any traceback info for generator's ending. https://github.com/vitek/cython/commit/63620bc2a29f3064bbdf7a49eefffaae4e3c369d -- vitja. _______________________________________________ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel