Sorry for the delay in answering. I'll try to address all points: 1. Yes, the busy-poll design is intentional in Open MPI. :-( 1a. Yes, it probably does cause some performance degradation when used with TCP. 1b. It quite definitely is a (major) performance win for non-TCP networks. That's (unfortunately) why it's there -- you can't poll/select/epoll/whatever for these non-TCP kinds of networks (E.g., openfabrics networks) without killing performance. So you have to busy poll those networks with their native poll functions and then periodically select/poll/epoll/whatever all file descriptors. This unfortunately became a central architecture point for Open MPI's progression engine (because it's in the performance-critical code path).
2. The behavior you're seeing with yield_when_idle is also intentional. We're busy polling but we're yielding so that we play well with others. It does not in any way reduce the CPU utilization; it just make Open MPI share the CPU better. But it got somewhat weakened when sched_yield() lost its meaning in recent kernels. 3. We do know how to make our progression engine switch between blocking and busy-polling (i.e., we've had many discussions about it over the years -- shared memory message passing is the Big Problem). But no one has ever had the time / resources / motivation to implement it. If anyone has some time, I would love to explain what would need to be done (it's not rocket science, but it is a bit tricky and will require getting into some minutia in the guts of Open MPI :-\ ). Does that help at least explain why the code is the way it is? On Oct 2, 2010, at 6:30 PM, Manuel Prinz wrote: > On Sat, Oct 02, 2010 at 01:37:42PM -0700, Zack Weinberg wrote: >> I wrote a test MPI program that just calls MPI_Probe() once - this >> should block forever, since there are no sends happening. When run >> with >> >> $ mpirun -np 2 ./a.out >> >> MPI_Probe never returns and the processes spin through poll(), which >> is what I originally reported. So far so good. If I change the >> invocation to >> >> $ mpirun -np 2 --mca mpi_yield_when_idle 1 ./a.out >> >> the behavior is the same, except that the processes alternate between >> poll() and sched_yield(). This doesn't help anything; the scheduler >> is still being thrashed, and the CPU is not allowed to go idle. [In >> fact, my understanding of the Linux scheduler is that a zero-timeout >> poll() counts as a yield, so "Aggressive" mode isn't even doing >> anything constructive!] >> >> The desired behavior is for an idle cluster's processes to BLOCK in >> poll(). So mpi_yield_when_idle does not do what I want. >> >> Also, putting "mpi_yield_when_idle = 1" into >> ~/.openmpi/mca-params.conf has no effect, contra the documentation -- >> this perhaps ought to be its own bug. (I can set MCA parameters for R >> with environment variables, but that's not nearly as convenient as the >> host file.) > > I'm out of ideas here. Jeff, could you please comment on the issue? > You can find the full log here: > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=598553 > > Thanks in advance! > > Best regards, > Manuel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org