Re: [Cython] Speedup module-level lookup
2012/1/21 Stefan Behnel : > Chris Colbert, 19.01.2012 09:18: >> If it doesn't pass PyDict_CheckExact you won't be able to use it as the >> globals to eval or exec. > > What makes you say that? I tried and it worked for me, all the way back to > Python 2.4: > > > Python 2.4.6 (#2, Jan 21 2010, 23:45:25) > [GCC 4.4.1] on linux2 > Type "help", "copyright", "credits" or "license" for more information. class MyDict(dict): pass eval('1+1', MyDict()) > 2 exec '1+1' in MyDict() > > > I only see a couple of calls to PyDict_CheckExact() in CPython's sources > and they usually seem to be related to special casing for performance > reasons. Nothing that should impact a module's globals. > > Besides, Cython controls its own language usages of eval and exec. > Cool! It seems that python internally uses PyObject_GetItem() for module level lookups and not PyDict_GetItem(). Btw we use __Pyx_GetName() that calls PyObject_GetAttr() that isn't exactly the same for module lookups: # Works in Cython and doesn't work in Python print __class__ So we can override __getitem__() and __setitem__(): class MyDict(dict): def __init__(self): self._dict = {} def __getitem__(self, key): print '__getitem__', key return self._dict[key] def __setitem__(self, key, value): print '__setitem__', key, value self._dict[key] = value def __getattr__(self, key): print '__getattr__' d = MyDict() exec('x = 1; print x', d) eval('x', d) $ python foo.py __setitem__ x 1 __getitem__ x 1 __getitem__ x So we can make globals() return special dict with custom __setitem__()/__getitem__(). But it seems that we'll have to override many dict's standard methods like values(), update() and so on. That would be hard. -- vitja. ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Speedup module-level lookup
2012/1/21 Stefan Behnel : > Vitja Makarov, 19.01.2012 08:49: >> 2012/1/19 Robert Bradshaw: >>> On Wed, Jan 18, 2012 at 12:30 PM, Vitja Makarov wrote: I tried to optimize module lookups (__pyx_m) by caching internal PyDict state. In this example bar() is 1.6 time faster (500us against 842us): C = 123 def foo(a): return C * adef bar(): for i in range(1): foo(i) Here is proof of concept:https://github.com/vitek/cython/commit/1d134fe54a74e6fc6d39d09973db499680b2a8d9 So the question is: does it worth it? >>> >>> I think the right thing to do here is make all module-level globals >>> into "cdef public" attributes, i.e. C globals with getters and setters >>> for Python space. I'm not sure whether this would best be done by >>> creating a custom dict or module subclass, but it would probably be >>> cleaner and afford much more than a 1.6x speedup. >> >> Yes, nice idea. >> It's possible to subclass PyModuleObject and I didn't find any use of >> PyModule_CheckExact() in CPython's sources: >> >> import types >> import sys >> >> global_foo = 1234 >> >> class CustomModule(types.ModuleType): >> def __init__(self, name): >> types.ModuleType.__init__(self, name) >> sys.modules[name] = self >> >> @property >> def foo(self): >> return global_foo >> >> @foo.setter >> def foo(self, value): >> global global_foo >> global_foo = value >> >> CustomModule('foo') >> >> import foo >> print foo.foo > > The one thing I don't currently see is how to get the module subtype > instantiated in a safe and portable way. > We can do the same as types module: ModuleType = type(sys) or type(__builtins__) since we already got it (__pyx_b) > The normal way to create the module in Python 2.x is a call to > Py_InitModule*(), which internally does a PyImport_AddModule(). We may get > away with creating and registering the module object before calling into > Py_InitModule*(), so that PyImport_AddModule() finds it there. At least, > the internal checks on modules seem to use PyModule_Check() and not > PyModule_CheckExact(), so someone seems to have already thought about this. > > In Python 3.x, the situation is different. There is no lookup involved and > the module is always newly instantiated. That may mean that we have to copy > the module creation code into Cython. But that doesn't look like a huge > drawback (except for compatibility to potential future changes), because we > already do most of the module initialisation ourselves anyway, especially > now that we have CyFunction. > > I start feeling a bit like Linus Torvalds when he broke his minix > installation and went: "ok, what else do I need to add to this terminal > emulator in order to make it an operating system?" > -- vitja. ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Speedup module-level lookup
On Sat, Jan 21, 2012 at 2:35 AM, Vitja Makarov wrote: > 2012/1/21 Stefan Behnel : > > Chris Colbert, 19.01.2012 09:18: > >> If it doesn't pass PyDict_CheckExact you won't be able to use it as the > >> globals to eval or exec. > > > > What makes you say that? I tried and it worked for me, all the way back > to > > Python 2.4: > > > > > > Python 2.4.6 (#2, Jan 21 2010, 23:45:25) > > [GCC 4.4.1] on linux2 > > Type "help", "copyright", "credits" or "license" for more information. > class MyDict(dict): pass > eval('1+1', MyDict()) > > 2 > exec '1+1' in MyDict() > > > > > > > I only see a couple of calls to PyDict_CheckExact() in CPython's sources > > and they usually seem to be related to special casing for performance > > reasons. Nothing that should impact a module's globals. > > > > Besides, Cython controls its own language usages of eval and exec. > > > > Cool! > It seems that python internally uses PyObject_GetItem() for module > level lookups and not PyDict_GetItem(). > Btw we use __Pyx_GetName() that calls PyObject_GetAttr() that isn't > exactly the same for module lookups: > > # Works in Cython and doesn't work in Python > print __class__ > > So we can override __getitem__() and __setitem__(): > class MyDict(dict): >def __init__(self): >self._dict = {} > >def __getitem__(self, key): >print '__getitem__', key >return self._dict[key] > >def __setitem__(self, key, value): >print '__setitem__', key, value >self._dict[key] = value > >def __getattr__(self, key): >print '__getattr__' > > d = MyDict() > exec('x = 1; print x', d) > eval('x', d) > $ python foo.py > __setitem__ x 1 > __getitem__ x > 1 > __getitem__ x > > > So we can make globals() return special dict with custom > __setitem__()/__getitem__(). But it seems that we'll have to override > many dict's standard methods like values(), update() and so on. That > would be hard. > > > Be careful. That only works because your dict subclass is being used as the locals as well. The LOAD_NAME opcode does a PyDict_CheckExact on the locals and will call PyDict_GetItem if true, PyObject_GetItem if False: case LOAD_NAME: w = GETITEM(names, oparg); if ((v = f->f_locals) == NULL) { PyErr_Format(PyExc_SystemError, "no locals when loading %s", PyObject_REPR(w)); why = WHY_EXCEPTION; break; } if (PyDict_CheckExact(v)) { x = PyDict_GetItem(v, w); Py_XINCREF(x); } else { x = PyObject_GetItem(v, w); if (x == NULL && PyErr_Occurred()) { if (!PyErr_ExceptionMatches( PyExc_KeyError)) break; PyErr_Clear(); } } You can see that the dict subclassing breaks down when you pass an empty dict as the locals: In [1]: class Foo(dict): ...: def __getitem__(self, name): ...: print 'get', name ...: return super(Foo, self).__getitem__(name) ...: In [2]: f = Foo(a=42) In [3]: eval('a', f) get a Out[3]: 42 In [4]: eval('a', f, {}) Out[4]: 42 ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Speedup module-level lookup
2012/1/21 Chris Colbert : > > > On Sat, Jan 21, 2012 at 2:35 AM, Vitja Makarov > wrote: >> >> 2012/1/21 Stefan Behnel : >> > Chris Colbert, 19.01.2012 09:18: >> >> If it doesn't pass PyDict_CheckExact you won't be able to use it as the >> >> globals to eval or exec. >> > >> > What makes you say that? I tried and it worked for me, all the way back >> > to >> > Python 2.4: >> > >> > >> > Python 2.4.6 (#2, Jan 21 2010, 23:45:25) >> > [GCC 4.4.1] on linux2 >> > Type "help", "copyright", "credits" or "license" for more information. >> class MyDict(dict): pass >> eval('1+1', MyDict()) >> > 2 >> exec '1+1' in MyDict() >> >> > >> > >> > I only see a couple of calls to PyDict_CheckExact() in CPython's sources >> > and they usually seem to be related to special casing for performance >> > reasons. Nothing that should impact a module's globals. >> > >> > Besides, Cython controls its own language usages of eval and exec. >> > >> >> Cool! >> It seems that python internally uses PyObject_GetItem() for module >> level lookups and not PyDict_GetItem(). >> Btw we use __Pyx_GetName() that calls PyObject_GetAttr() that isn't >> exactly the same for module lookups: >> >> # Works in Cython and doesn't work in Python >> print __class__ >> >> So we can override __getitem__() and __setitem__(): >> class MyDict(dict): >> def __init__(self): >> self._dict = {} >> >> def __getitem__(self, key): >> print '__getitem__', key >> return self._dict[key] >> >> def __setitem__(self, key, value): >> print '__setitem__', key, value >> self._dict[key] = value >> >> def __getattr__(self, key): >> print '__getattr__' >> >> d = MyDict() >> exec('x = 1; print x', d) >> eval('x', d) >> $ python foo.py >> __setitem__ x 1 >> __getitem__ x >> 1 >> __getitem__ x >> >> >> So we can make globals() return special dict with custom >> __setitem__()/__getitem__(). But it seems that we'll have to override >> many dict's standard methods like values(), update() and so on. That >> would be hard. >> >> > > Be careful. That only works because your dict subclass is being used as the > locals as well. The LOAD_NAME opcode does a PyDict_CheckExact on the locals > and will call PyDict_GetItem if true, PyObject_GetItem if False: > > case LOAD_NAME: > w = GETITEM(names, oparg); > if ((v = f->f_locals) == NULL) { > PyErr_Format(PyExc_SystemError, > "no locals when loading %s", > PyObject_REPR(w)); > why = WHY_EXCEPTION; > break; > } > if (PyDict_CheckExact(v)) { > x = PyDict_GetItem(v, w); > Py_XINCREF(x); > } > else { > x = PyObject_GetItem(v, w); > if (x == NULL && PyErr_Occurred()) { > if (!PyErr_ExceptionMatches( > PyExc_KeyError)) > break; > PyErr_Clear(); > } > > } > > > You can see that the dict subclassing breaks down when you pass an empty > dict as the locals: > > In [1]: class Foo(dict): ...: def __getitem__(self, name): ...: print 'get', > name ...: return super(Foo, self).__getitem__(name) ...: In [2]: f = > Foo(a=42) In [3]: eval('a', f) get a Out[3]: 42 In [4]: eval('a', f, {}) > Out[4]: 42 > > Nice catch! It seems that globals MUST be a real dict. >>> help(eval) eval(...) eval(source[, globals[, locals]]) -> value Evaluate the source in the context of globals and locals. The source may be a string representing a Python expression or a code object as returned by compile(). The globals must be a dictionary and locals can be any mapping, defaulting to the current globals and locals. If only globals is given, locals defaults to it. -- vitja. ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
[Cython] AddTraceback() slows down generators
Hi, I did some callgrind profiling on Cython's generators and was surprised to find that AddTraceback() represents a serious performance penalty for short running generators. I profiled a compiled Python implementation of itertools.groupby(), which yields (key, group) tuples where the group is an iterator again. I ran this code in Python for benchmarking: """ L = sorted(range(1000)*5) all(list(g) for k,g in groupby(L)) """ Groups tend to be rather short in real code, often just one or a couple of items, so unpacking the group iterator into a list will usually be a quick loop and then the generator raises StopIteration on termination and builds a traceback for it. According to callgrind (which, I should note, tends to overestimate the amount of time spent in memory allocation), the iteration during the group unpacking takes about 30% of the overall runtime of the all() loop, and the AddTraceback() call at the end of each group traversal takes up to 25% (!) on my side. That means that more than 80% of the group unpacking time goes into raising StopIteration from the generators. I attached the call graph with the relative timings. About half of the exception raising time is eaten by PyString_FromFormat() that builds the function-name + line-position string (which, I may note, is basically a convenience feature). This string is a constant for a generator's StopIteration exception, at least for each final return point in a generator, but here it is being recreated over and over again, for each exception that gets raised. Even if we keep creating a new frame instance each time (which should be ok because CPython has a frame instance cache already and we'd only create one during the generator lifetime), the whole code object could actually be cached after the first creation, preferably bound to the lifetime of the generator creator function/method. Or, more generally, one code object per generator termination point, which will be a single point in the majority of cases. For the specific code above, that should shave off almost 20% of the overall runtime of the all() loop. I think that's totally worth doing. Stefan <>___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] AddTraceback() slows down generators
On 01/21/2012 07:50 PM, Stefan Behnel wrote: Hi, I did some callgrind profiling on Cython's generators and was surprised to find that AddTraceback() represents a serious performance penalty for short running generators. I profiled a compiled Python implementation of itertools.groupby(), which yields (key, group) tuples where the group is an iterator again. I ran this code in Python for benchmarking: """ L = sorted(range(1000)*5) all(list(g) for k,g in groupby(L)) """ Groups tend to be rather short in real code, often just one or a couple of items, so unpacking the group iterator into a list will usually be a quick loop and then the generator raises StopIteration on termination and builds a traceback for it. According to callgrind (which, I should note, tends to overestimate the amount of time spent in memory allocation), the iteration during the group unpacking takes about 30% of the overall runtime of the all() loop, and the AddTraceback() call at the end of each group traversal takes up to 25% (!) on my side. That means that more than 80% of the group unpacking time goes into raising StopIteration from the generators. I attached the call graph with the relative timings. OT: Since you complain that callgrind is inaccurate; are you aware of sampling profilers, such as Google perftools? (I don't have experience with callgrind myself) http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html http://pypi.python.org/pypi/yep Dag About half of the exception raising time is eaten by PyString_FromFormat() that builds the function-name + line-position string (which, I may note, is basically a convenience feature). This string is a constant for a generator's StopIteration exception, at least for each final return point in a generator, but here it is being recreated over and over again, for each exception that gets raised. Even if we keep creating a new frame instance each time (which should be ok because CPython has a frame instance cache already and we'd only create one during the generator lifetime), the whole code object could actually be cached after the first creation, preferably bound to the lifetime of the generator creator function/method. Or, more generally, one code object per generator termination point, which will be a single point in the majority of cases. For the specific code above, that should shave off almost 20% of the overall runtime of the all() loop. I think that's totally worth doing. Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] AddTraceback() slows down generators
On Sat, Jan 21, 2012 at 10:50 AM, Stefan Behnel wrote: > Hi, > > I did some callgrind profiling on Cython's generators and was surprised to > find that AddTraceback() represents a serious performance penalty for short > running generators. > > I profiled a compiled Python implementation of itertools.groupby(), which > yields (key, group) tuples where the group is an iterator again. I ran this > code in Python for benchmarking: > > """ > L = sorted(range(1000)*5) > > all(list(g) for k,g in groupby(L)) > """ > > Groups tend to be rather short in real code, often just one or a couple of > items, so unpacking the group iterator into a list will usually be a quick > loop and then the generator raises StopIteration on termination and builds > a traceback for it. According to callgrind (which, I should note, tends to > overestimate the amount of time spent in memory allocation), the iteration > during the group unpacking takes about 30% of the overall runtime of the > all() loop, and the AddTraceback() call at the end of each group traversal > takes up to 25% (!) on my side. That means that more than 80% of the group > unpacking time goes into raising StopIteration from the generators. I > attached the call graph with the relative timings. > > About half of the exception raising time is eaten by PyString_FromFormat() > that builds the function-name + line-position string (which, I may note, is > basically a convenience feature). This string is a constant for a > generator's StopIteration exception, at least for each final return point > in a generator, but here it is being recreated over and over again, for > each exception that gets raised. > > Even if we keep creating a new frame instance each time (which should be ok > because CPython has a frame instance cache already and we'd only create one > during the generator lifetime), the whole code object could actually be > cached after the first creation, preferably bound to the lifetime of the > generator creator function/method. Or, more generally, one code object per > generator termination point, which will be a single point in the majority > of cases. For the specific code above, that should shave off almost 20% of > the overall runtime of the all() loop. > > I think that's totally worth doing. Makes sense to me. I did some caching like this for profiling. - Robert ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel