Stefan Behnel, 26.01.2012 21:57: > Vitja Makarov, 26.01.2012 21:19: >> 2012/1/27 Stefan Behnel: >>> Robert Bradshaw, 21.01.2012 23:09: >>>> On Sat, Jan 21, 2012 at 10:50 AM, Stefan Behnel wrote: >>>>> I did some callgrind profiling on Cython's generators and was surprised to >>>>> find that AddTraceback() represents a serious performance penalty for >>>>> short >>>>> running generators. >>>>> >>>>> I profiled a compiled Python implementation of itertools.groupby(), which >>>>> yields (key, group) tuples where the group is an iterator again. I ran >>>>> this >>>>> code in Python for benchmarking: >>>>> >>>>> """ >>>>> L = sorted(range(1000)*5) >>>>> >>>>> all(list(g) for k,g in groupby(L)) >>>>> """ >>>>> >>>>> Groups tend to be rather short in real code, often just one or a couple of >>>>> items, so unpacking the group iterator into a list will usually be a quick >>>>> loop and then the generator raises StopIteration on termination and builds >>>>> a traceback for it. According to callgrind (which, I should note, tends to >>>>> overestimate the amount of time spent in memory allocation), the iteration >>>>> during the group unpacking takes about 30% of the overall runtime of the >>>>> all() loop, and the AddTraceback() call at the end of each group traversal >>>>> takes up to 25% (!) on my side. That means that more than 80% of the group >>>>> unpacking time goes into raising StopIteration from the generators. I >>>>> attached the call graph with the relative timings. >>>>> >>>>> About half of the exception raising time is eaten by PyString_FromFormat() >>>>> that builds the function-name + line-position string (which, I may note, >>>>> is >>>>> basically a convenience feature). This string is a constant for a >>>>> generator's StopIteration exception, at least for each final return point >>>>> in a generator, but here it is being recreated over and over again, for >>>>> each exception that gets raised. >>>>> >>>>> Even if we keep creating a new frame instance each time (which should be >>>>> ok >>>>> because CPython has a frame instance cache already and we'd only create >>>>> one >>>>> during the generator lifetime), the whole code object could actually be >>>>> cached after the first creation, preferably bound to the lifetime of the >>>>> generator creator function/method. Or, more generally, one code object per >>>>> generator termination point, which will be a single point in the majority >>>>> of cases. For the specific code above, that should shave off almost 20% of >>>>> the overall runtime of the all() loop. >> >> I think that could be easily fixed. CPython doesn't add any traceback >> info for generator's ending. >> >> https://github.com/vitek/cython/commit/63620bc2a29f3064bbdf7a49eefffaae4e3c369d > > Interesting. It doesn't solve the general problem of slow exceptions, but I > think that's a beautiful way of fixing this particular performance problem > for generators.
One thing that it doesn't fix is when a generator gets its input from another iterator and terminates automatically by passing on the StopIteration that the iterator raises. I.e. any exception *propagation* is still substantially slower than necessary, and that's a general issue. Stefan _______________________________________________ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel