Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
I did some work on this some years ago. I have more or less concluded that
it was a waste of effort. But first let me explain what the suggested
approach do not work. As it uses memory mapping to create shared memory
(i.e. shared segments are not named), they must be created ahead of
spawning processes. But if you really want this to work smoothly, you want
named shared memory (Sys V IPC or posix shm_open), so that shared arrays
can be created in the spawned processes and passed back.

Now for the reason I don't care about shared memory arrays anymore, and
what I am currently working on instead:

1. I have come across very few cases where threaded code cannot be used in
numerical computing. In fact, multithreading nearly always happens in the
code where I write pure C or Fortran anyway. Most often it happens in
library code that are already multithreaded (Intel MKL, Apple Accelerate
Framework, OpenBLAS, etc.), which means using it requires no extra effort
from my side. A multithreaded LAPACK library is not less multithreaded if I
call it from Python.

2. Getting shared memory right can be difficult because of hierarchical
memory and false sharing. You might not see it if you only have a multicore
CPU with a shared cache. But your code might not scale up on computers with
more than one physical processor. False sharing acts like the GIL, except
it happens in hardware and affects your C code invisibly without any
explicit locking you can pinpoint. This is also why MPI code tends to scale
much better than OpenMP code. If nothing is shared there will be no false
sharing.

3. Raw C level IPC is cheap – very, very cheap. Even if you use pipes or
sockets instead of shared memory it is cheap. There are very few cases
where the IPC tends to be a bottleneck. 

4. The reason IPC appears expensive with NumPy is because multiprocessing
pickles the arrays. It is pickle that is slow, not the IPC. Some would say
that the pickle overhead is an integral part of the IPC ovearhead, but i
will argue that it is not. The slowness of pickle is a separate problem
alltogether.

5. Share memory does not improve on the pickle overhead because also NumPy
arrays with shared memory must be pickled. Multiprocessing can bypass
pickling the RawArray object, but the rest of the NumPy array is pickled.
Using shared memory arrays have no speed advantage over normal NumPy arrays
when we use multiprocessing.

6. It is much easier to write concurrent code that uses queues for message
passing than anything else. That is why using a Queue object has been the
popular Pythonic approach to both multitreading and multiprocessing. I
would like this to continue.

I am therefore focusing my effort on the multiprocessing.Queue object. If
you understand the six points I listed you will see where this is going:
What we really need is a specialized queue that has knowledge about NumPy
arrays and can bypass pickle. I am therefore focusing my efforts on
creating a NumPy aware queue object.

We are not doing the users a favor by encouraging the use of shared memory
arrays. They help with nothing.


Sturla Molden



Matěj  Týč  wrote:
> Dear Numpy developers,
> I propose a pull request https://github.com/numpy/numpy/pull/7533 that
> features numpy arrays that can be shared among processes (with some
> effort).
> 
> Why:
> In CPython, multiprocessing is the only way of how to exploit
> multi-core CPUs if your parallel code can't avoid creating Python
> objects. In that case, CPython's GIL makes threads unusable. However,
> unlike with threading, sharing data among processes is something that
> is non-trivial and platform-dependent.
> 
> Although numpy (and certainly some other packages) implement some
> operations in a way that GIL is not a concern, consider another case:
> You have a large amount of data in a form of a numpy array and you
> want to pass it to a function of an arbitrary Python module that also
> expects numpy array (e.g. list of vertices coordinates as an input and
> array of the corresponding polygon as an output). Here, it is clear
> GIL is an issue you and since you want a numpy array on both ends, now
> you would have to copy your numpy array to a multiprocessing.Array (to
> pass the data) and then to convert it back to ndarray in the worker
> process.
> This contribution would streamline it a bit - you would create an
> array as you are used to, pass it to the subprocess as you would do
> with the multiprocessing.Array, and the process can work with a numpy
> array right away.
> 
> How:
> The idea is to create a numpy array in a buffer that can be shared
> among processes. Python has support for this in its standard library,
> so the current solution creates a multiprocessing.Array and then
> passes it as the "buffer" to the ndarray.__new__. That would be it on
> Unixes, but on Windows, there has to be a a custom pickle method,
> otherwise the array "forgets" that its buffer is that special and the
> sharing doesn't work.
> 
> Some of what h

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Elliot Hallmark
Strula, this sounds brilliant!  To be clear, you're talking about
serializing the numpy array and reconstructing it in a way that's faster
than pickle? Or using shared memory and signaling array creation around
that shared memory rather than using pickle?

For what it's worth, I have used shared memory with numpy arrays as IPC (no
queue), with one process writing to it and one process reading from it, and
liked it.  Your point #5 did not apply because I was reusing the shared
memory.

Do you have a public repo where you are working on this?

Thanks!
  Elliot

On Wed, May 11, 2016 at 3:29 AM, Sturla Molden 
wrote:

> I did some work on this some years ago. I have more or less concluded that
> it was a waste of effort. But first let me explain what the suggested
> approach do not work. As it uses memory mapping to create shared memory
> (i.e. shared segments are not named), they must be created ahead of
> spawning processes. But if you really want this to work smoothly, you want
> named shared memory (Sys V IPC or posix shm_open), so that shared arrays
> can be created in the spawned processes and passed back.
>
> Now for the reason I don't care about shared memory arrays anymore, and
> what I am currently working on instead:
>
> 1. I have come across very few cases where threaded code cannot be used in
> numerical computing. In fact, multithreading nearly always happens in the
> code where I write pure C or Fortran anyway. Most often it happens in
> library code that are already multithreaded (Intel MKL, Apple Accelerate
> Framework, OpenBLAS, etc.), which means using it requires no extra effort
> from my side. A multithreaded LAPACK library is not less multithreaded if I
> call it from Python.
>
> 2. Getting shared memory right can be difficult because of hierarchical
> memory and false sharing. You might not see it if you only have a multicore
> CPU with a shared cache. But your code might not scale up on computers with
> more than one physical processor. False sharing acts like the GIL, except
> it happens in hardware and affects your C code invisibly without any
> explicit locking you can pinpoint. This is also why MPI code tends to scale
> much better than OpenMP code. If nothing is shared there will be no false
> sharing.
>
> 3. Raw C level IPC is cheap – very, very cheap. Even if you use pipes or
> sockets instead of shared memory it is cheap. There are very few cases
> where the IPC tends to be a bottleneck.
>
> 4. The reason IPC appears expensive with NumPy is because multiprocessing
> pickles the arrays. It is pickle that is slow, not the IPC. Some would say
> that the pickle overhead is an integral part of the IPC ovearhead, but i
> will argue that it is not. The slowness of pickle is a separate problem
> alltogether.
>
> 5. Share memory does not improve on the pickle overhead because also NumPy
> arrays with shared memory must be pickled. Multiprocessing can bypass
> pickling the RawArray object, but the rest of the NumPy array is pickled.
> Using shared memory arrays have no speed advantage over normal NumPy arrays
> when we use multiprocessing.
>
> 6. It is much easier to write concurrent code that uses queues for message
> passing than anything else. That is why using a Queue object has been the
> popular Pythonic approach to both multitreading and multiprocessing. I
> would like this to continue.
>
> I am therefore focusing my effort on the multiprocessing.Queue object. If
> you understand the six points I listed you will see where this is going:
> What we really need is a specialized queue that has knowledge about NumPy
> arrays and can bypass pickle. I am therefore focusing my efforts on
> creating a NumPy aware queue object.
>
> We are not doing the users a favor by encouraging the use of shared memory
> arrays. They help with nothing.
>
>
> Sturla Molden
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Allan Haldane
On 05/11/2016 04:29 AM, Sturla Molden wrote:
> 4. The reason IPC appears expensive with NumPy is because multiprocessing
> pickles the arrays. It is pickle that is slow, not the IPC. Some would say
> that the pickle overhead is an integral part of the IPC ovearhead, but i
> will argue that it is not. The slowness of pickle is a separate problem
> alltogether.

That's interesting. I've also used multiprocessing with numpy and didn't
realize that. Is this true in python3 too?

In python2 it appears that multiprocessing uses pickle protocol 0 which
must cause a big slowdown (a factor of 100) relative to protocol 2, and
uses pickle instead of cPickle.

a = np.arange(40*40)

%timeit pickle.dumps(a)
1000 loops, best of 3: 1.63 ms per loop

%timeit cPickle.dumps(a)
1000 loops, best of 3: 1.56 ms per loop

%timeit cPickle.dumps(a, protocol=2)
10 loops, best of 3: 18.9 µs per loop

Python 3 uses protocol 3 by default:

%timeit pickle.dumps(a)
1 loops, best of 3: 20 µs per loop


> 5. Share memory does not improve on the pickle overhead because also NumPy
> arrays with shared memory must be pickled. Multiprocessing can bypass
> pickling the RawArray object, but the rest of the NumPy array is pickled.
> Using shared memory arrays have no speed advantage over normal NumPy arrays
> when we use multiprocessing.
> 
> 6. It is much easier to write concurrent code that uses queues for message
> passing than anything else. That is why using a Queue object has been the
> popular Pythonic approach to both multitreading and multiprocessing. I
> would like this to continue.
> 
> I am therefore focusing my effort on the multiprocessing.Queue object. If
> you understand the six points I listed you will see where this is going:
> What we really need is a specialized queue that has knowledge about NumPy
> arrays and can bypass pickle. I am therefore focusing my efforts on
> creating a NumPy aware queue object.
> 
> We are not doing the users a favor by encouraging the use of shared memory
> arrays. They help with nothing.
> 
> 
> Sturla Molden


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Benjamin Root
Oftentimes, if one needs to share numpy arrays for multiprocessing, I would
imagine that it is because the array is huge, right? So, the pickling
approach would copy that array for each process, which defeats the purpose,
right?

Ben Root

On Wed, May 11, 2016 at 2:01 PM, Allan Haldane 
wrote:

> On 05/11/2016 04:29 AM, Sturla Molden wrote:
> > 4. The reason IPC appears expensive with NumPy is because multiprocessing
> > pickles the arrays. It is pickle that is slow, not the IPC. Some would
> say
> > that the pickle overhead is an integral part of the IPC ovearhead, but i
> > will argue that it is not. The slowness of pickle is a separate problem
> > alltogether.
>
> That's interesting. I've also used multiprocessing with numpy and didn't
> realize that. Is this true in python3 too?
>
> In python2 it appears that multiprocessing uses pickle protocol 0 which
> must cause a big slowdown (a factor of 100) relative to protocol 2, and
> uses pickle instead of cPickle.
>
> a = np.arange(40*40)
>
> %timeit pickle.dumps(a)
> 1000 loops, best of 3: 1.63 ms per loop
>
> %timeit cPickle.dumps(a)
> 1000 loops, best of 3: 1.56 ms per loop
>
> %timeit cPickle.dumps(a, protocol=2)
> 10 loops, best of 3: 18.9 µs per loop
>
> Python 3 uses protocol 3 by default:
>
> %timeit pickle.dumps(a)
> 1 loops, best of 3: 20 µs per loop
>
>
> > 5. Share memory does not improve on the pickle overhead because also
> NumPy
> > arrays with shared memory must be pickled. Multiprocessing can bypass
> > pickling the RawArray object, but the rest of the NumPy array is pickled.
> > Using shared memory arrays have no speed advantage over normal NumPy
> arrays
> > when we use multiprocessing.
> >
> > 6. It is much easier to write concurrent code that uses queues for
> message
> > passing than anything else. That is why using a Queue object has been the
> > popular Pythonic approach to both multitreading and multiprocessing. I
> > would like this to continue.
> >
> > I am therefore focusing my effort on the multiprocessing.Queue object. If
> > you understand the six points I listed you will see where this is going:
> > What we really need is a specialized queue that has knowledge about NumPy
> > arrays and can bypass pickle. I am therefore focusing my efforts on
> > creating a NumPy aware queue object.
> >
> > We are not doing the users a favor by encouraging the use of shared
> memory
> > arrays. They help with nothing.
> >
> >
> > Sturla Molden
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Feng Yu
Hi,

I've been thinking and exploring this for some time. If we are to
start some effort I'd like to help. Here are my comments, mostly
regarding to Sturla's comments.

1. If we are talking about shared memory and copy-on-write
inheritance, then we are using 'fork'. If we are free to use fork,
then a large chunk of the concerns regarding the python std library
multiprocessing is no longer relevant. Especially those functions must
be in a module limitation that tends to impose a special requirement
on the software design.

2. Picking of inherited shared memory array can be done minimally by
just picking the array_interface and the pointer address. It is
because the child process and the parent share the same address space
layout, guarenteed by the fork call.

3. The RawArray and RawValue implementation in std multiprocessing has
its own memory allocator for managing small variables. It is a huge
overkill (in terms of implementation) if we only care about very large
memory chunks.

4. Hidden sychronization cost on multi-cpu (NUMA?) systems. A choice
is to defer the responsibility of avoiding racing to the developer.
Simple structs for working on slices of array in parallel can cover a
huge fraction of use cases and fully avoid this issue.

5. Whether to delegate parallelism to underlying low level
implementation or to implement the paralellism in python while
maintaining the underlying low level implementation sequential is
probably dependent on the problem. It may be convenient as of the
current state of parallelism support in Python to delegate, but will
it forever be the case?

For example, after the MPI FFTW binding stuck for a long time, someone
wrote a parallel python FFT package
(https://github.com/spectralDNS/mpiFFT4py)  that uses FFTW for
sequential and write all parallel semantics in Python with mpi4py, and
it uses a more efficient domain decomposition.

6. If we are to define a set of operations I would recommend take a
look at OpenMP as a reference -- It has been out there for decades and
used widely. An equiavlant to the 'omp parallel for' construct in
Python will be a very good starting point and immediately useful.

- Yu

On Wed, May 11, 2016 at 11:22 AM, Benjamin Root  wrote:
> Oftentimes, if one needs to share numpy arrays for multiprocessing, I would
> imagine that it is because the array is huge, right? So, the pickling
> approach would copy that array for each process, which defeats the purpose,
> right?
>
> Ben Root
>
> On Wed, May 11, 2016 at 2:01 PM, Allan Haldane 
> wrote:
>>
>> On 05/11/2016 04:29 AM, Sturla Molden wrote:
>> > 4. The reason IPC appears expensive with NumPy is because
>> > multiprocessing
>> > pickles the arrays. It is pickle that is slow, not the IPC. Some would
>> > say
>> > that the pickle overhead is an integral part of the IPC ovearhead, but i
>> > will argue that it is not. The slowness of pickle is a separate problem
>> > alltogether.
>>
>> That's interesting. I've also used multiprocessing with numpy and didn't
>> realize that. Is this true in python3 too?
>>
>> In python2 it appears that multiprocessing uses pickle protocol 0 which
>> must cause a big slowdown (a factor of 100) relative to protocol 2, and
>> uses pickle instead of cPickle.
>>
>> a = np.arange(40*40)
>>
>> %timeit pickle.dumps(a)
>> 1000 loops, best of 3: 1.63 ms per loop
>>
>> %timeit cPickle.dumps(a)
>> 1000 loops, best of 3: 1.56 ms per loop
>>
>> %timeit cPickle.dumps(a, protocol=2)
>> 10 loops, best of 3: 18.9 µs per loop
>>
>> Python 3 uses protocol 3 by default:
>>
>> %timeit pickle.dumps(a)
>> 1 loops, best of 3: 20 µs per loop
>>
>>
>> > 5. Share memory does not improve on the pickle overhead because also
>> > NumPy
>> > arrays with shared memory must be pickled. Multiprocessing can bypass
>> > pickling the RawArray object, but the rest of the NumPy array is
>> > pickled.
>> > Using shared memory arrays have no speed advantage over normal NumPy
>> > arrays
>> > when we use multiprocessing.
>> >
>> > 6. It is much easier to write concurrent code that uses queues for
>> > message
>> > passing than anything else. That is why using a Queue object has been
>> > the
>> > popular Pythonic approach to both multitreading and multiprocessing. I
>> > would like this to continue.
>> >
>> > I am therefore focusing my effort on the multiprocessing.Queue object.
>> > If
>> > you understand the six points I listed you will see where this is going:
>> > What we really need is a specialized queue that has knowledge about
>> > NumPy
>> > arrays and can bypass pickle. I am therefore focusing my efforts on
>> > creating a NumPy aware queue object.
>> >
>> > We are not doing the users a favor by encouraging the use of shared
>> > memory
>> > arrays. They help with nothing.
>> >
>> >
>> > Sturla Molden
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> 

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Joe Kington
In python2 it appears that multiprocessing uses pickle protocol 0 which
> must cause a big slowdown (a factor of 100) relative to protocol 2, and
> uses pickle instead of cPickle.
>
>
Even on Python 2.x, multiprocessing uses protocol 2, not protocol 0.  The
default for the `pickle` module changed, but multiprocessing has always
used a binary pickle protocol to communicate between processes.  Have a
look at multiprocessing's forking.py in Python 2.7.

As some context here for folks that may not be aware, Sturla is referring
to his earlier shared memory implementation
 he wrote that avoids
actually pickling the data, and instead essentially pickles a pointer to an
array in shared memory.  As Sturla very nicely summed up, it saves memory
usage, but doesn't help the deeper issues.  You're far better off just
communicating between processes as opposed to using shared memory.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
Elliot Hallmark  wrote:
> Strula, this sounds brilliant!  To be clear, you're talking about
> serializing the numpy array and reconstructing it in a way that's faster
> than pickle?

Yes. We know the binary format of NumPy arrays. We don't need to invoke the
machinery of pickle to serialize an array and write the bytes to some IPC
mechanism (pipe, tcp socket, unix socket, shared memory). The choise of IPC
mechanism might not even be relevant, and could even be deferred to a
library like ZeroMQ. The point is that if multiple peocesses are to
cooperate efficiently, we need a way to let them communicate NumPy arrays
quickly. That is where using multiprocessing hurts today, and shared memory
does not help here.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
Allan Haldane  wrote:

> That's interesting. I've also used multiprocessing with numpy and didn't
> realize that. Is this true in python3 too?

I am not sure. As you have noticed, pickle is faster by to orders of
magnitude on Python 3. But several microseconds is also a lot, particularly
if we are going to do this often during a computation.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
Benjamin Root  wrote:

> Oftentimes, if one needs to share numpy arrays for multiprocessing, I would
> imagine that it is because the array is huge, right? 

That is a case for shared memory, but what. i was taking about is more
common than this. In order for processes to cooperate, they must
communicate. So we need a way to pass around NumPy arrays quickly.
Sometimes we want to use shared memory because of the size of the data, but
more often it is just used as a form of inexpensive IPC.

> So, the pickling
> approach would copy that array for each process, which defeats the purpose,
> right?

I am not sure what you mean. When I made shared memory arrays I used named
segments, and made sure only the name of the segments were pickled, not the
contents of the buffers.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
Joe Kington  wrote:

> You're far better off just
> communicating between processes as opposed to using shared memory.

Yes.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
Feng Yu  wrote:

> 1. If we are talking about shared memory and copy-on-write
> inheritance, then we are using 'fork'. 

Not available on Windows. On Unix it only allows one-way communication,
from parent to child.

> 2. Picking of inherited shared memory array can be done minimally by
> just picking the array_interface and the pointer address. It is
> because the child process and the parent share the same address space
> layout, guarenteed by the fork call.

Again, not everyone uses Unix. 

And on Unix it is not trival to pass data back from the child process. I
solved that problem with Sys V IPC (pickling the name of the segment).

> 6. If we are to define a set of operations I would recommend take a
> look at OpenMP as a reference -- It has been out there for decades and
> used widely. An equiavlant to the 'omp parallel for' construct in
> Python will be a very good starting point and immediately useful.

If you are on Unix, you can just use a context manager. Call os.fork in
__enter__ and os.waitpid in __exit__. 

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Allan Haldane
On 05/11/2016 06:39 PM, Joe Kington wrote:
> 
> 
> In python2 it appears that multiprocessing uses pickle protocol 0 which
> must cause a big slowdown (a factor of 100) relative to protocol 2, and
> uses pickle instead of cPickle.
> 
> 
> Even on Python 2.x, multiprocessing uses protocol 2, not protocol 0. 
> The default for the `pickle` module changed, but multiprocessing has
> always used a binary pickle protocol to communicate between processes. 
> Have a look at multiprocessing's forking.py  in
> Python 2.7.

Are you sure? As far as I understood the code, it uses the default
protocol 0. The file forking.py no longer exists, also.

https://github.com/python/cpython/tree/master/Lib/multiprocessing
(see reduction.py and queue.py)
http://bugs.python.org/issue23403
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Allan Haldane
On 05/11/2016 06:48 PM, Sturla Molden wrote:
> Elliot Hallmark  wrote:
>> Strula, this sounds brilliant!  To be clear, you're talking about
>> serializing the numpy array and reconstructing it in a way that's faster
>> than pickle?
> 
> Yes. We know the binary format of NumPy arrays. We don't need to invoke the
> machinery of pickle to serialize an array and write the bytes to some IPC
> mechanism (pipe, tcp socket, unix socket, shared memory). The choise of IPC
> mechanism might not even be relevant, and could even be deferred to a
> library like ZeroMQ. The point is that if multiple peocesses are to
> cooperate efficiently, we need a way to let them communicate NumPy arrays
> quickly. That is where using multiprocessing hurts today, and shared memory
> does not help here.
> 
> Sturla

You probably already know this, but I just wanted to note that the
mpi4py module has worked around pickle too. They discuss how they
efficiently transfer numpy arrays in mpi messages here:
http://pythonhosted.org/mpi4py/usrman/overview.html#communicating-python-objects-and-array-data

Of course not everyone is able to install mpi easily.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

2016-05-11 Thread Sturla Molden
Allan Haldane  wrote:

> You probably already know this, but I just wanted to note that the
> mpi4py module has worked around pickle too. They discuss how they
> efficiently transfer numpy arrays in mpi messages here:
> http://pythonhosted.org/mpi4py/usrman/overview.html#communicating-python-objects-and-array-data

Unless I am mistaken, they use the PEP 3118 buffer interface to support
NumPy as well as a number of other Python objects. However, this protocol
makes buffer aquisition an expensive operation. You can see this in Cython
if you use typed memory views. Assigning a NumPy array to a typed
memoryview (i,e, buffer acqisition) is slow. They are correct that avoiding
pickle means we save some memory. It also avoids creating and destroying
temporary Python objects, and associated reference counting. However,
because of the expensive buffer acquisition, I am not sure how much faster
their apporach will be. I prefer to use the NumPy C API, and bypass any
unneccesary overhead. The idea is to make IPC of NumPy arrays fast, and
then we cannot have an expensive buffer acquisition in there.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion