Lombard, David N wrote:
I'll try all the usual things (reduce the optimization level, etc).
Sage words of advice (and clue sticks) welcome.
Not trying to sound like an ad...
The currently shipping Intel Trace Collector and Analyzer (7.1), includes
message correctness checking. An option is available that adds a
library to an Intel MPI build that checks messages during the run.
You can then view any errors it found in the Intel Trace Analyzer.
This may find there's a problem that has only just started to trip the
code up. I certainly have welts from those; I suspect others do too.
Actually, Intel MPI and related tools are in general one of the things
we want to try. User may be open to that (especially if it is more pain
free than the alternative).
We have reliable functional non-sm/non-ib based execution on multiple
machines now. New code drop coming, so we have to wait on that. Once
we have that, we'll be doing more testing.
Joe
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web : http://www.scalableinformatics.com
http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf