date:20120121

Re: [Cython] Speedup module-level lookup

2012-01-21 Thread Vitja Makarov

2012/1/21 Stefan Behnel :
> Chris Colbert, 19.01.2012 09:18:
>> If it doesn't pass PyDict_CheckExact you won't be able to use it as the
>> globals to eval or exec.
>
> What makes you say that? I tried and it worked for me, all the way back to
> Python 2.4:
>
> 
> Python 2.4.6 (#2, Jan 21 2010, 23:45:25)
> [GCC 4.4.1] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
 class MyDict(dict): pass
 eval('1+1', MyDict())
> 2
 exec '1+1' in MyDict()

> 
>
> I only see a couple of calls to PyDict_CheckExact() in CPython's sources
> and they usually seem to be related to special casing for performance
> reasons. Nothing that should impact a module's globals.
>
> Besides, Cython controls its own language usages of eval and exec.
>

Cool!
It seems that python internally uses PyObject_GetItem() for module
level lookups and not PyDict_GetItem().
Btw we use __Pyx_GetName() that calls PyObject_GetAttr() that isn't
exactly the same for module lookups:

# Works in Cython and doesn't work in Python
print __class__

So we can override __getitem__() and __setitem__():
class MyDict(dict):
def __init__(self):
self._dict = {}

def __getitem__(self, key):
print '__getitem__', key
return self._dict[key]

def __setitem__(self, key, value):
print '__setitem__', key, value
self._dict[key] = value

def __getattr__(self, key):
print '__getattr__'

d = MyDict()
exec('x = 1; print x', d)
eval('x', d)
$ python foo.py
__setitem__ x 1
__getitem__ x
1
__getitem__ x


So we can make globals() return special dict with custom
__setitem__()/__getitem__(). But it seems that we'll have to override
many dict's standard methods like values(), update() and so on. That
would be hard.


-- 
vitja.
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel

Re: [Cython] Speedup module-level lookup

2012-01-21 Thread Vitja Makarov

2012/1/21 Stefan Behnel :
> Vitja Makarov, 19.01.2012 08:49:
>> 2012/1/19 Robert Bradshaw:
>>> On Wed, Jan 18, 2012 at 12:30 PM, Vitja Makarov wrote:
 I tried to optimize module lookups (__pyx_m) by caching internal PyDict 
 state.

 In this example bar() is 1.6 time faster (500us against 842us):

 C = 123
 def foo(a):
     return C * adef bar():
     for i in range(1):        foo(i)
 Here is proof of
 concept:https://github.com/vitek/cython/commit/1d134fe54a74e6fc6d39d09973db499680b2a8d9

 So the question is: does it worth it?
>>>
>>> I think the right thing to do here is make all module-level globals
>>> into "cdef public" attributes, i.e. C globals with getters and setters
>>> for Python space. I'm not sure whether this would best be done by
>>> creating a custom dict or module subclass, but it would probably be
>>> cleaner and afford much more than a 1.6x speedup.
>>
>> Yes, nice idea.
>> It's possible to subclass PyModuleObject and I didn't find any use of
>> PyModule_CheckExact() in CPython's sources:
>>
>> import types
>> import sys
>>
>> global_foo = 1234
>>
>> class CustomModule(types.ModuleType):
>>     def __init__(self, name):
>>         types.ModuleType.__init__(self, name)
>>         sys.modules[name] = self
>>
>>     @property
>>     def foo(self):
>>         return global_foo
>>
>>     @foo.setter
>>     def foo(self, value):
>>         global global_foo
>>         global_foo = value
>>
>> CustomModule('foo')
>>
>> import foo
>> print foo.foo
>
> The one thing I don't currently see is how to get the module subtype
> instantiated in a safe and portable way.
>

We can do the same as types module:

ModuleType = type(sys)

or type(__builtins__) since we already got it (__pyx_b)

> The normal way to create the module in Python 2.x is a call to
> Py_InitModule*(), which internally does a PyImport_AddModule(). We may get
> away with creating and registering the module object before calling into
> Py_InitModule*(), so that PyImport_AddModule() finds it there. At least,
> the internal checks on modules seem to use PyModule_Check() and not
> PyModule_CheckExact(), so someone seems to have already thought about this.
>
> In Python 3.x, the situation is different. There is no lookup involved and
> the module is always newly instantiated. That may mean that we have to copy
> the module creation code into Cython. But that doesn't look like a huge
> drawback (except for compatibility to potential future changes), because we
> already do most of the module initialisation ourselves anyway, especially
> now that we have CyFunction.
>
> I start feeling a bit like Linus Torvalds when he broke his minix
> installation and went: "ok, what else do I need to add to this terminal
> emulator in order to make it an operating system?"
>



-- 
vitja.
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel

Re: [Cython] Speedup module-level lookup

2012-01-21 Thread Chris Colbert

On Sat, Jan 21, 2012 at 2:35 AM, Vitja Makarov wrote:

> 2012/1/21 Stefan Behnel :
> > Chris Colbert, 19.01.2012 09:18:
> >> If it doesn't pass PyDict_CheckExact you won't be able to use it as the
> >> globals to eval or exec.
> >
> > What makes you say that? I tried and it worked for me, all the way back
> to
> > Python 2.4:
> >
> > 
> > Python 2.4.6 (#2, Jan 21 2010, 23:45:25)
> > [GCC 4.4.1] on linux2
> > Type "help", "copyright", "credits" or "license" for more information.
>  class MyDict(dict): pass
>  eval('1+1', MyDict())
> > 2
>  exec '1+1' in MyDict()
> 
> > 
> >
> > I only see a couple of calls to PyDict_CheckExact() in CPython's sources
> > and they usually seem to be related to special casing for performance
> > reasons. Nothing that should impact a module's globals.
> >
> > Besides, Cython controls its own language usages of eval and exec.
> >
>
> Cool!
> It seems that python internally uses PyObject_GetItem() for module
> level lookups and not PyDict_GetItem().
> Btw we use __Pyx_GetName() that calls PyObject_GetAttr() that isn't
> exactly the same for module lookups:
>
> # Works in Cython and doesn't work in Python
> print __class__
>
> So we can override __getitem__() and __setitem__():
> class MyDict(dict):
>def __init__(self):
>self._dict = {}
>
>def __getitem__(self, key):
>print '__getitem__', key
>return self._dict[key]
>
>def __setitem__(self, key, value):
>print '__setitem__', key, value
>self._dict[key] = value
>
>def __getattr__(self, key):
>print '__getattr__'
>
> d = MyDict()
> exec('x = 1; print x', d)
> eval('x', d)
> $ python foo.py
> __setitem__ x 1
> __getitem__ x
> 1
> __getitem__ x
>
>
> So we can make globals() return special dict with custom
> __setitem__()/__getitem__(). But it seems that we'll have to override
> many dict's standard methods like values(), update() and so on. That
> would be hard.
>
>
>
Be careful. That only works because your dict subclass is being used as the
locals as well. The LOAD_NAME opcode does a PyDict_CheckExact on the locals
and will call PyDict_GetItem if true, PyObject_GetItem if False:

case LOAD_NAME:
w = GETITEM(names, oparg);
if ((v = f->f_locals) == NULL) {
PyErr_Format(PyExc_SystemError,
 "no locals when loading %s",
 PyObject_REPR(w));
why = WHY_EXCEPTION;
break;
}
if (PyDict_CheckExact(v)) {
x = PyDict_GetItem(v, w);
Py_XINCREF(x);
}
else {
x = PyObject_GetItem(v, w);
if (x == NULL && PyErr_Occurred()) {
if (!PyErr_ExceptionMatches(
PyExc_KeyError))
break;
PyErr_Clear();
}

}


You can see that the dict subclassing breaks down when you pass an empty
dict as the locals:

In [1]: class Foo(dict): ...: def __getitem__(self, name): ...: print
'get', name ...: return super(Foo, self).__getitem__(name) ...: In [2]: f =
Foo(a=42) In [3]: eval('a', f) get a Out[3]: 42 In [4]: eval('a', f, {})
Out[4]: 42
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel

Re: [Cython] Speedup module-level lookup

2012-01-21 Thread Vitja Makarov

2012/1/21 Chris Colbert :
>
>
> On Sat, Jan 21, 2012 at 2:35 AM, Vitja Makarov 
> wrote:
>>
>> 2012/1/21 Stefan Behnel :
>> > Chris Colbert, 19.01.2012 09:18:
>> >> If it doesn't pass PyDict_CheckExact you won't be able to use it as the
>> >> globals to eval or exec.
>> >
>> > What makes you say that? I tried and it worked for me, all the way back
>> > to
>> > Python 2.4:
>> >
>> > 
>> > Python 2.4.6 (#2, Jan 21 2010, 23:45:25)
>> > [GCC 4.4.1] on linux2
>> > Type "help", "copyright", "credits" or "license" for more information.
>>  class MyDict(dict): pass
>>  eval('1+1', MyDict())
>> > 2
>>  exec '1+1' in MyDict()
>> 
>> > 
>> >
>> > I only see a couple of calls to PyDict_CheckExact() in CPython's sources
>> > and they usually seem to be related to special casing for performance
>> > reasons. Nothing that should impact a module's globals.
>> >
>> > Besides, Cython controls its own language usages of eval and exec.
>> >
>>
>> Cool!
>> It seems that python internally uses PyObject_GetItem() for module
>> level lookups and not PyDict_GetItem().
>> Btw we use __Pyx_GetName() that calls PyObject_GetAttr() that isn't
>> exactly the same for module lookups:
>>
>> # Works in Cython and doesn't work in Python
>> print __class__
>>
>> So we can override __getitem__() and __setitem__():
>> class MyDict(dict):
>>    def __init__(self):
>>        self._dict = {}
>>
>>    def __getitem__(self, key):
>>        print '__getitem__', key
>>        return self._dict[key]
>>
>>    def __setitem__(self, key, value):
>>        print '__setitem__', key, value
>>        self._dict[key] = value
>>
>>    def __getattr__(self, key):
>>        print '__getattr__'
>>
>> d = MyDict()
>> exec('x = 1; print x', d)
>> eval('x', d)
>> $ python foo.py
>> __setitem__ x 1
>> __getitem__ x
>> 1
>> __getitem__ x
>>
>>
>> So we can make globals() return special dict with custom
>> __setitem__()/__getitem__(). But it seems that we'll have to override
>> many dict's standard methods like values(), update() and so on. That
>> would be hard.
>>
>>
>
> Be careful. That only works because your dict subclass is being used as the
> locals as well. The LOAD_NAME opcode does a PyDict_CheckExact on the locals
> and will call PyDict_GetItem if true, PyObject_GetItem if False:
>
> case LOAD_NAME:
> w = GETITEM(names, oparg);
> if ((v = f->f_locals) == NULL) {
> PyErr_Format(PyExc_SystemError,
>  "no locals when loading %s",
>  PyObject_REPR(w));
> why = WHY_EXCEPTION;
> break;
> }
> if (PyDict_CheckExact(v)) {
> x = PyDict_GetItem(v, w);
> Py_XINCREF(x);
> }
> else {
> x = PyObject_GetItem(v, w);
> if (x == NULL && PyErr_Occurred()) {
> if (!PyErr_ExceptionMatches(
> PyExc_KeyError))
> break;
> PyErr_Clear();
> }
>
> }
>
>
> You can see that the dict subclassing breaks down when you pass an empty
> dict as the locals:
>
> In [1]: class Foo(dict): ...: def __getitem__(self, name): ...: print 'get',
> name ...: return super(Foo, self).__getitem__(name) ...: In [2]: f =
> Foo(a=42) In [3]: eval('a', f) get a Out[3]: 42 In [4]: eval('a', f, {})
> Out[4]: 42
>
>

Nice catch! It seems that globals MUST be a real dict.

>>> help(eval)
eval(...)
eval(source[, globals[, locals]]) -> value

Evaluate the source in the context of globals and locals.
The source may be a string representing a Python expression
or a code object as returned by compile().
The globals must be a dictionary and locals can be any mapping,
defaulting to the current globals and locals.
If only globals is given, locals defaults to it.


-- 
vitja.
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel

[Cython] AddTraceback() slows down generators

2012-01-21 Thread Stefan Behnel

Hi,

I did some callgrind profiling on Cython's generators and was surprised to
find that AddTraceback() represents a serious performance penalty for short
running generators.

I profiled a compiled Python implementation of itertools.groupby(), which
yields (key, group) tuples where the group is an iterator again. I ran this
code in Python for benchmarking:

"""
L = sorted(range(1000)*5)

all(list(g) for k,g in groupby(L))
"""

Groups tend to be rather short in real code, often just one or a couple of
items, so unpacking the group iterator into a list will usually be a quick
loop and then the generator raises StopIteration on termination and builds
a traceback for it. According to callgrind (which, I should note, tends to
overestimate the amount of time spent in memory allocation), the iteration
during the group unpacking takes about 30% of the overall runtime of the
all() loop, and the AddTraceback() call at the end of each group traversal
takes up to 25% (!) on my side. That means that more than 80% of the group
unpacking time goes into raising StopIteration from the generators. I
attached the call graph with the relative timings.

About half of the exception raising time is eaten by PyString_FromFormat()
that builds the function-name + line-position string (which, I may note, is
basically a convenience feature). This string is a constant for a
generator's StopIteration exception, at least for each final return point
in a generator, but here it is being recreated over and over again, for
each exception that gets raised.

Even if we keep creating a new frame instance each time (which should be ok
because CPython has a frame instance cache already and we'd only create one
during the generator lifetime), the whole code object could actually be
cached after the first creation, preferably bound to the lifetime of the
generator creator function/method. Or, more generally, one code object per
generator termination point, which will be a single point in the majority
of cases. For the specific code above, that should shave off almost 20% of
the overall runtime of the all() loop.

I think that's totally worth doing.

Stefan
<>___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel

Re: [Cython] AddTraceback() slows down generators

2012-01-21 Thread Dag Sverre Seljebotn


On 01/21/2012 07:50 PM, Stefan Behnel wrote:

Hi,

I did some callgrind profiling on Cython's generators and was surprised to
find that AddTraceback() represents a serious performance penalty for short
running generators.

I profiled a compiled Python implementation of itertools.groupby(), which
yields (key, group) tuples where the group is an iterator again. I ran this
code in Python for benchmarking:

"""
L = sorted(range(1000)*5)

all(list(g) for k,g in groupby(L))
"""

Groups tend to be rather short in real code, often just one or a couple of
items, so unpacking the group iterator into a list will usually be a quick
loop and then the generator raises StopIteration on termination and builds
a traceback for it. According to callgrind (which, I should note, tends to
overestimate the amount of time spent in memory allocation), the iteration
during the group unpacking takes about 30% of the overall runtime of the
all() loop, and the AddTraceback() call at the end of each group traversal
takes up to 25% (!) on my side. That means that more than 80% of the group
unpacking time goes into raising StopIteration from the generators. I
attached the call graph with the relative timings.


OT: Since you complain that callgrind is inaccurate; are you aware of 
sampling profilers, such as Google perftools? (I don't have experience 
with callgrind myself)


http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html

http://pypi.python.org/pypi/yep

Dag



About half of the exception raising time is eaten by PyString_FromFormat()
that builds the function-name + line-position string (which, I may note, is
basically a convenience feature). This string is a constant for a
generator's StopIteration exception, at least for each final return point
in a generator, but here it is being recreated over and over again, for
each exception that gets raised.

Even if we keep creating a new frame instance each time (which should be ok
because CPython has a frame instance cache already and we'd only create one
during the generator lifetime), the whole code object could actually be
cached after the first creation, preferably bound to the lifetime of the
generator creator function/method. Or, more generally, one code object per
generator termination point, which will be a single point in the majority
of cases. For the specific code above, that should shave off almost 20% of
the overall runtime of the all() loop.

I think that's totally worth doing.

Stefan



___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel

Re: [Cython] AddTraceback() slows down generators

2012-01-21 Thread Robert Bradshaw

On Sat, Jan 21, 2012 at 10:50 AM, Stefan Behnel  wrote:
> Hi,
>
> I did some callgrind profiling on Cython's generators and was surprised to
> find that AddTraceback() represents a serious performance penalty for short
> running generators.
>
> I profiled a compiled Python implementation of itertools.groupby(), which
> yields (key, group) tuples where the group is an iterator again. I ran this
> code in Python for benchmarking:
>
> """
> L = sorted(range(1000)*5)
>
> all(list(g) for k,g in groupby(L))
> """
>
> Groups tend to be rather short in real code, often just one or a couple of
> items, so unpacking the group iterator into a list will usually be a quick
> loop and then the generator raises StopIteration on termination and builds
> a traceback for it. According to callgrind (which, I should note, tends to
> overestimate the amount of time spent in memory allocation), the iteration
> during the group unpacking takes about 30% of the overall runtime of the
> all() loop, and the AddTraceback() call at the end of each group traversal
> takes up to 25% (!) on my side. That means that more than 80% of the group
> unpacking time goes into raising StopIteration from the generators. I
> attached the call graph with the relative timings.
>
> About half of the exception raising time is eaten by PyString_FromFormat()
> that builds the function-name + line-position string (which, I may note, is
> basically a convenience feature). This string is a constant for a
> generator's StopIteration exception, at least for each final return point
> in a generator, but here it is being recreated over and over again, for
> each exception that gets raised.
>
> Even if we keep creating a new frame instance each time (which should be ok
> because CPython has a frame instance cache already and we'd only create one
> during the generator lifetime), the whole code object could actually be
> cached after the first creation, preferably bound to the lifetime of the
> generator creator function/method. Or, more generally, one code object per
> generator termination point, which will be a single point in the majority
> of cases. For the specific code above, that should shave off almost 20% of
> the overall runtime of the all() loop.
>
> I think that's totally worth doing.

Makes sense to me. I did some caching like this for profiling.

- Robert
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel

Re: [Cython] Speedup module-level lookup

Re: [Cython] Speedup module-level lookup

Re: [Cython] Speedup module-level lookup

Re: [Cython] Speedup module-level lookup

[Cython] AddTraceback() slows down generators

Re: [Cython] AddTraceback() slows down generators

Re: [Cython] AddTraceback() slows down generators

7 matches

Site Navigation

Mail list logo

Footer information