[I know in an ideal world the vendor between us and PathScale^WQlogic would sort this out.]
I'm interested in the cause (and possible cure!) of intermittent errors on various nodes in our Infinipath system which stop MPI jobs with kernel messages like this, in case anyone's familiar with them: lvinfi095:21.Hardware problem: {[RXE EAGERTID Memory Parity]} They seem to be new with an upgrade to Linux 2.6.22 from 2.6.11, but probably just manifested themselves in some other way previously. Google didn't produce any leads, and a brief look in the source suggests that tracking it down where it's generated in the ib_ipath module is non-trivial and likely won't tell me a lot. For what it's worth, the adaptors are 06:00.0 InfiniBand: PathScale, Inc InfiniPath HT-400 (rev 02) in two different sorts of Supermicro whose model numbers I don't know. Thanks for any leads. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf