A non-intrusive test you could try is to replace your MPI (mpich) with a lower-latency one. Scali or MPI/Gamma are just to name two. These can lower your latency down to 15muS or so.
gamma is highly hardware dependent. does scali really provide a latency improvement independent of hardware?
If this drastically ups your efficiency you know where your bottleneck is.
indeed. but another alternative is to find a _SLOWER_ MPI implementation. in fact, I wonder if there's a handy place in, say, mpich, to put a simple usleep() for this purpose. perhaps just enable tracing. usleep as a tool for performance characterization! _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf