[issue20540] Python 3.3/3.4 regression in multiprocessing manager ?
New submission from Irvin Probst: After checking with the kind people of [email protected] I think I found a bug in the way connections are handled in the multiprocessing module. Using python 3.3 I've noticed a performance drop of about 70 times when running some code performing basic requests on a SyncManager. As this code was burried deep into a big project I made a test case attached to this report to reproduce this behavior. Here is what this code does: - define a class SharedData with several instance variables (a,b and c here) - this class has two methods exposed through a proxy (good() ans bad()) both see a huge performance drop using 3.3 and can be used to reproduce this behavior. The only difference is that good() uses a mutex whereas bad() does not, I wished to check that mutexes were not to blame for this problem. - create a Syncmanager giving access to a SharedData instance - launch a multiprocessing.Process() running the do_stuff() function, this function calls 10 times the good() (or bad()) method of SharedData through the Syncmanager, passes some values to it and gets back the result. - after each call to the proxy the time elapsed, roughly measured with time.time(), is printed System specs: Linux turing 3.12-1-686-pae #1 SMP Debian 3.12.6-2 (2013-12-29) i686 GNU/Linux Python version: latests 2.6,2.7,3.2 and 3.3 from standard debian repos 3.3.0 and 3.4.0 beta3 compiled from source time elapsed in each call to the proxy using Python 2.6, 2.7 and 3.2: first call to proxy ~ 0.04 seconds, next calls ~0.001 sec time elapsed in each call to the proxy using Python 3.3.0, 3.3.2, 3.3.3 , 3.4.0b3: first call to proxy ~0.27 seconds, next calls: 0.07 sec I reproduced this behavior using python 2.7 and 3.3.3 on an ubuntu computer running the latest amd64 release. Of course I tried without a print() after each call but it does not change anything, python 3.3 is *much* slower here. Using cProfile I tracked this down to the builtin read() method and indeed timing the read() syscall in posix_read() using gettimeofday() confirms it takes ages, posix_write() is fine though. I think the problem comes from the underlying socket the interpreter is reading from. connections.py from multiprocessing has been rewrittend between 3.2 and 3.3 but I can't see what's wrong in the way it has been done, basic socket options seem to be exactly the same. One interesting point is that it seems to only affect the last bytes sent through the socket, e.g if you send a numpy.array big enough to fill the socket's read() buffer the first calls to read() are done at a normal speed, only the last one takes time. If you confirm my test code is not to blame it makes IMHO the SyncManager in 3.3 and 3.4 completely unusable for frequent data exchanges between processes. Thanks for your time. -- components: Library (Lib) files: test_manager.py messages: 210457 nosy: Irvin.Probst priority: normal severity: normal status: open title: Python 3.3/3.4 regression in multiprocessing manager ? type: performance versions: Python 3.3, Python 3.4 Added file: http://bugs.python.org/file33956/test_manager.py ___ Python tracker <http://bugs.python.org/issue20540> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20540] Python 3.3/3.4 regression in multiprocessing manager ?
Irvin Probst added the comment:
FWIW, according to your comments I tried a quick and dirty fix in my code as I
can't wait for a new Python release to make it work:
The do_stuff function now does:
"""
def do_stuff():
client=make_client('',, b"foo")
data_proxy=client.get_proxy()
#make a dummy request to get the underlying
#fd we are reading from (see bug #20540)
c=data_proxy.good([1,2],[3,4])
fd=data_proxy._tls.connection._handle
#setting TCP_NODELAY on 3.3.x should fix the delay issue until a new release
sock=socket.fromfd(fd, socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
for i in range(10):
t_s=time.time()
c=data_proxy.good([1,2],[3,4])
print(time.time()-t_s)
print(c)
"""
I'm now down to 0.04s per request instead of ~0.08s, I guess the remaining
delay comes from the server side socket which has not been affected by the
TCP_NODELAY on the client side.
Regards.
--
___
Python tracker
<http://bugs.python.org/issue20540>
___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20626] Manager documentation unclear about lists and thread safeness
New submission from Irvin Probst:
In the the Manager's lists documentation one can read:
"""
list()
list(sequence)
Create a shared list object and return a proxy for it.
"""
IMHO it is really unclear whether these lists have something more than
traditionnal lists or not.
When you have a look at managers.py it is quite obvious, unless I'm completely
wrong, that the underlying shared object is a standard list with a basic proxy
to expose all the "underscore underscore stuff".
"""
BaseListProxy = MakeProxyType('BaseListProxy', (
'__add__', '__contains__', '__delitem__', '__getitem__', '__len__',
'__mul__', '__reversed__', '__rmul__', '__setitem__',
'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove',
'reverse', 'sort', '__imul__'
))
class ListProxy(BaseListProxy):
def __iadd__(self, value):
self._callmethod('extend', (value,))
return self
def __imul__(self, value):
self._callmethod('__imul__', (value,))
return self
[snip a couple of lines]
SyncManager.register('list', list, ListProxy)
"""
That's really confusing because, unless reading managers.py, you have the
feeling that the manager's lists() are somehow different than standard lists.
The other problem is that, if you don't know what level of thread-safeness the
GIL guarantees on the lists in the manager's server thread, you have the
feeling that the safeness comes from some obscure Manager's black magic
managing concurrent access for you.
May I suggest to add in the documentation:
1/
"""
list()
list(sequence)
Create a shared list (add here a link to
http://docs.python.org/3.3/library/stdtypes.html#list) object and return a
proxy for it.
"""
2/
Clearly state somewhere in the manager's documentation that it's the
developper's job to ensure that the proxied methods are thread safe. Write it
in bold and red please :-)
3/
Perhaps add an example with a custom object like the code I attached to this
report.
Thanks for your time.
--
files: example.py
messages: 211220
nosy: Irvin.Probst
priority: normal
severity: normal
status: open
title: Manager documentation unclear about lists and thread safeness
Added file: http://bugs.python.org/file34080/example.py
___
Python tracker
<http://bugs.python.org/issue20626>
___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20626] Manager documentation unclear about lists and thread safeness
Changes by Irvin Probst : -- assignee: -> docs@python components: +Documentation nosy: +docs@python versions: +Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4, Python 3.5 ___ Python tracker <http://bugs.python.org/issue20626> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
