Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
Allan Haldane wrote: > You probably already know this, but I just wanted to note that the > mpi4py module has worked around pickle too. They discuss how they > efficiently transfer numpy arrays in mpi messages here: > http://pythonhosted.org/mpi4py/usrman/overview.html#communicating-python-object

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Allan Haldane
On 05/11/2016 06:48 PM, Sturla Molden wrote: > Elliot Hallmark wrote: >> Strula, this sounds brilliant! To be clear, you're talking about >> serializing the numpy array and reconstructing it in a way that's faster >> than pickle? > > Yes. We know the binary format of NumPy arrays. We don't need

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Allan Haldane
On 05/11/2016 06:39 PM, Joe Kington wrote: > > > In python2 it appears that multiprocessing uses pickle protocol 0 which > must cause a big slowdown (a factor of 100) relative to protocol 2, and > uses pickle instead of cPickle. > > > Even on Python 2.x, multiprocessing uses protoco

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
Feng Yu wrote: > 1. If we are talking about shared memory and copy-on-write > inheritance, then we are using 'fork'. Not available on Windows. On Unix it only allows one-way communication, from parent to child. > 2. Picking of inherited shared memory array can be done minimally by > just picki

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
Joe Kington wrote: > You're far better off just > communicating between processes as opposed to using shared memory. Yes. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
Benjamin Root wrote: > Oftentimes, if one needs to share numpy arrays for multiprocessing, I would > imagine that it is because the array is huge, right? That is a case for shared memory, but what. i was taking about is more common than this. In order for processes to cooperate, they must commu

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
Allan Haldane wrote: > That's interesting. I've also used multiprocessing with numpy and didn't > realize that. Is this true in python3 too? I am not sure. As you have noticed, pickle is faster by to orders of magnitude on Python 3. But several microseconds is also a lot, particularly if we are

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
Elliot Hallmark wrote: > Strula, this sounds brilliant! To be clear, you're talking about > serializing the numpy array and reconstructing it in a way that's faster > than pickle? Yes. We know the binary format of NumPy arrays. We don't need to invoke the machinery of pickle to serialize an arra

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Joe Kington
In python2 it appears that multiprocessing uses pickle protocol 0 which > must cause a big slowdown (a factor of 100) relative to protocol 2, and > uses pickle instead of cPickle. > > Even on Python 2.x, multiprocessing uses protocol 2, not protocol 0. The default for the `pickle` module changed,

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Feng Yu
Hi, I've been thinking and exploring this for some time. If we are to start some effort I'd like to help. Here are my comments, mostly regarding to Sturla's comments. 1. If we are talking about shared memory and copy-on-write inheritance, then we are using 'fork'. If we are free to use fork, then

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Benjamin Root
Oftentimes, if one needs to share numpy arrays for multiprocessing, I would imagine that it is because the array is huge, right? So, the pickling approach would copy that array for each process, which defeats the purpose, right? Ben Root On Wed, May 11, 2016 at 2:01 PM, Allan Haldane wrote: > O

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Allan Haldane
On 05/11/2016 04:29 AM, Sturla Molden wrote: > 4. The reason IPC appears expensive with NumPy is because multiprocessing > pickles the arrays. It is pickle that is slow, not the IPC. Some would say > that the pickle overhead is an integral part of the IPC ovearhead, but i > will argue that it is no

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Elliot Hallmark
Strula, this sounds brilliant! To be clear, you're talking about serializing the numpy array and reconstructing it in a way that's faster than pickle? Or using shared memory and signaling array creation around that shared memory rather than using pickle? For what it's worth, I have used shared me

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
I did some work on this some years ago. I have more or less concluded that it was a waste of effort. But first let me explain what the suggested approach do not work. As it uses memory mapping to create shared memory (i.e. shared segments are not named), they must be created ahead of spawning proce