reassign 598553 openmpi thanks On 2 October 2010 at 08:39, Zack Weinberg wrote: | On Sat, Oct 2, 2010 at 6:01 AM, Manuel Prinz <man...@debian.org> wrote: | >> On 29 September 2010 at 18:22, Zack Weinberg wrote: | >> | (on an 8-core machine), CPU utilization jumps *immediately* from 98% idle | >> | to 20% user, 70% system, 12% idle. strace reveals that each slave is | >> | spinning through poll() calls with timeout zero, rather than blocking | >> | until a message arrives, as the documentation for mpi.probe() suggests | >> | should happen. | ... | > Well, no. Actually, this behavior is by design. I'm not sure about the details | > exactly but can get back to Jeff if you're interested in those. This is coming | > up every now and then in the BTS or the user list. Open MPI is basically burning | > every free cycle that is not used for computation (busy wait). There are no | > immediate plans of changing that, as far as I know. | | Well I do think this is a design error in OpenMPI. There are plenty
I will let the two of you sort this out. Rmpi is simply standing in the middle, talking to Open MPI. Zack: We didn't have a decent MPICH2 in Debian for ages which I always defaulted to LAM and then Open MPI for Rmpi. You try a local Rmpi package, or direct installation to /usr/local/lib/R/site-packages, of Rmpi built against MPICH2 if Open MPI bugs you too much. Dirk | of use cases where an OpenMPI cluster might legitimately go idle for | some time, and the CPU should be doing something other than | busy-waiting. | | The one _I_ care about is, I'm debugging a large genetic optimization | that needs to be parallelized for runs to finish in a reasonable | amount of time, so I want the cluster _available_ all the time (I | don't want to have to do startCluster/stopCluster for every run) but | the CPU should go to sleep when I'm not doing a run, so the fan quiets | down and I can hear myself think. | | Another, similar scenario is when the same machine is time-shared | among several clusters each dedicated to a particular task, which only | runs when jobs come in. When any given cluster is not doing any work | it should not busy-wait, because that puts unnecessary load on the | scheduler. | | Also, I'd not be surprised if busy-waiting here actually made message | receive latency _worse_ due to scheduler thrashing. | | zw -- Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org