[issue20540] Python 3.3/3.4 regression in multiprocessing manager ?

2014-02-07 Thread Irvin Probst

New submission from Irvin Probst:

After checking with the kind people of [email protected] I think I found a bug in 
the way connections are handled in the multiprocessing module.

Using python 3.3 I've noticed a performance drop of about 70 times when running 
some code performing basic requests on a SyncManager. As this code was burried 
deep into a big project I made a test case attached to this report to reproduce 
this behavior.

Here is what this code does:
- define a class SharedData with several instance variables (a,b and c here)
- this class has two methods exposed through a proxy (good() ans bad()) both 
see a huge performance drop using 3.3 and can be used to reproduce this 
behavior. The only difference is that good() uses a mutex whereas bad() does 
not, I wished to check that mutexes were not to blame for this problem.
- create a Syncmanager giving access to a SharedData instance
- launch a multiprocessing.Process() running the do_stuff() function, this 
function calls 10 times the good() (or bad()) method of SharedData through the 
Syncmanager, passes some values to it and gets back the result.
- after each call to the proxy the time elapsed, roughly measured with 
time.time(), is printed

System specs:
Linux turing 3.12-1-686-pae #1 SMP Debian 3.12.6-2 (2013-12-29) i686 GNU/Linux

Python version:
latests 2.6,2.7,3.2 and 3.3 from standard debian repos
3.3.0 and 3.4.0 beta3 compiled from source

time elapsed in each call to the proxy using Python 2.6, 2.7 and 3.2:
first call to proxy ~ 0.04 seconds, next calls ~0.001 sec

time elapsed in each call to the proxy using Python 3.3.0, 3.3.2, 3.3.3 , 
3.4.0b3:
first call to proxy ~0.27 seconds, next calls: 0.07 sec

I reproduced this behavior using python 2.7 and 3.3.3 on an ubuntu computer 
running the latest amd64 release.

Of course I tried without a print() after each call but it does not change 
anything, python 3.3 is *much* slower here.

Using cProfile I tracked this down to the builtin read() method and indeed 
timing the read() syscall in posix_read() using gettimeofday() confirms it 
takes ages, posix_write() is fine though. I think the problem comes from the 
underlying socket the interpreter is reading from.

connections.py from multiprocessing has been rewrittend between 3.2 and 3.3 but 
I can't see what's wrong in the way it has been done, basic socket options seem 
to be exactly the same.

One interesting point is that it seems to only affect the last bytes sent 
through the socket, e.g if you send a numpy.array big enough to fill the 
socket's read() buffer the first calls to read() are done at a normal speed, 
only the last one takes time.

If you confirm my test code is not to blame it makes IMHO the SyncManager in 
3.3 and 3.4 completely unusable for frequent data exchanges between processes.

Thanks for your time.

--
components: Library (Lib)
files: test_manager.py
messages: 210457
nosy: Irvin.Probst
priority: normal
severity: normal
status: open
title: Python 3.3/3.4 regression in multiprocessing manager ?
type: performance
versions: Python 3.3, Python 3.4
Added file: http://bugs.python.org/file33956/test_manager.py

___
Python tracker 
<http://bugs.python.org/issue20540>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20540] Python 3.3/3.4 regression in multiprocessing manager ?

2014-02-07 Thread Irvin Probst

Irvin Probst added the comment:

FWIW, according to your comments I tried a quick and dirty fix in my code as I 
can't wait for a new Python release to make it work:

The do_stuff function now does:

"""
def do_stuff():
client=make_client('',, b"foo")
data_proxy=client.get_proxy()

#make a dummy request to get the underlying
#fd we are reading from (see bug #20540)
c=data_proxy.good([1,2],[3,4])
fd=data_proxy._tls.connection._handle

#setting TCP_NODELAY on 3.3.x should fix the delay issue until a new release
sock=socket.fromfd(fd, socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)

for i in range(10):
t_s=time.time()
c=data_proxy.good([1,2],[3,4])
print(time.time()-t_s)
print(c)
"""

I'm now down to 0.04s per request instead of ~0.08s, I guess the remaining 
delay comes from the server side socket which has not been affected by the 
TCP_NODELAY on the client side.

Regards.

--

___
Python tracker 
<http://bugs.python.org/issue20540>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20626] Manager documentation unclear about lists and thread safeness

2014-02-14 Thread Irvin Probst

New submission from Irvin Probst:

In the the Manager's lists documentation one can read:

"""
list()
list(sequence)
Create a shared list object and return a proxy for it.
"""

IMHO it is really unclear whether these lists have something more than 
traditionnal lists or not. 

When you have a look at managers.py it is quite obvious, unless I'm completely 
wrong, that the underlying shared object is a standard list with a basic proxy 
to expose all the "underscore underscore stuff".

"""
BaseListProxy = MakeProxyType('BaseListProxy', (
'__add__', '__contains__', '__delitem__', '__getitem__', '__len__',
'__mul__', '__reversed__', '__rmul__', '__setitem__',
'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove',
'reverse', 'sort', '__imul__'
))

class ListProxy(BaseListProxy):
def __iadd__(self, value):
self._callmethod('extend', (value,))
return self
def __imul__(self, value):
self._callmethod('__imul__', (value,))
return self

[snip a couple of lines]

SyncManager.register('list', list, ListProxy)
"""

That's really confusing because, unless reading managers.py, you have the 
feeling that the manager's lists() are somehow different than standard lists.

The other problem is that, if you don't know what level of thread-safeness the 
GIL guarantees on the lists in the manager's server thread, you have the 
feeling that the safeness comes from some obscure Manager's black magic 
managing concurrent access for you.

May I suggest to add in the documentation:


1/

"""
list()
list(sequence)
Create a shared list (add here a link to 
http://docs.python.org/3.3/library/stdtypes.html#list) object and return a 
proxy for it.
"""

2/ 

Clearly state somewhere in the manager's documentation that it's the 
developper's job to ensure that the proxied methods are thread safe. Write it 
in bold and red please :-) 

3/ 
Perhaps add an example with a custom object like the code I attached to this 
report.


Thanks for your time.

--
files: example.py
messages: 211220
nosy: Irvin.Probst
priority: normal
severity: normal
status: open
title: Manager documentation unclear about lists and thread safeness
Added file: http://bugs.python.org/file34080/example.py

___
Python tracker 
<http://bugs.python.org/issue20626>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20626] Manager documentation unclear about lists and thread safeness

2014-02-14 Thread Irvin Probst

Changes by Irvin Probst :


--
assignee:  -> docs@python
components: +Documentation
nosy: +docs@python
versions: +Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4, Python 
3.5

___
Python tracker 
<http://bugs.python.org/issue20626>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com