Ok ... thought this would be interesting for some folks. As a reminder, using Open-MPI 1.2.6 for a customer code, seeing different behavior than in the past. Scratching my head over it (seemingly non-deterministic).

I tried using '--mca btl ^sm' (turn off shared memory usage) on the non-infiniband machine, and ... it runs. Repeatedly. To completion.

Ok, over to the Infiniband machine. I tried using '--mca btl ^sm'. No dice (the tcp and openib are still available).

Next I tried turning off the tcp (ethernet)

        --mca btl ^sm,tcp

Nope. Still doesn't work right. Hmmm.... One left. Turn off openib (infiniband).


        --mca btl ^sm,openib

Yup.  It works.  Repeatedly.  To completion.

It looks like this is an MPI stack issue of some sort. I'll ping the Open-MPI list and see what they think.

Thanks to all the suggestions and comments.

FWIW, I also pulled down the DDT tool from Allinea, with the thought of testing it, and seeing if I could figure out where the problem was with the code.

Joe

--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web  : http://www.scalableinformatics.com
       http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to