Re: [Beowulf] Q: IB message rate & large core counts (per node)?

2010-02-24 Thread Scott Atchley
On Feb 23, 2010, at 6:16 PM, Brice Goglin wrote: > Greg Lindahl wrote: >>> now that I'm inventorying ignorance, I don't really understand why RDMA >>> always seems to be presented as a big hardware issue. wouldn't it be >>> pretty easy to define an eth or IP-level protocol to do remote puts, >>

Re: [Beowulf] Q: IB message rate & large core counts (per node)?

2010-02-24 Thread Brice Goglin
Greg Lindahl wrote: >> now that I'm inventorying ignorance, I don't really understand why RDMA >> always seems to be presented as a big hardware issue. wouldn't it be >> pretty easy to define an eth or IP-level protocol to do remote puts, >> gets, even test-and-set or reduce primitives, where th

Re: [Beowulf] Re: RAM ECC errors (Henning Fehrmann)

2010-02-24 Thread Mark Hahn
Strangely enough, panic_on_ue is off by default. this seems to be version-dependent (we have a bunch of HP XC clusters that have panic_on_ue (and log_ce) enabled by default. I didn't check the sources to see whether HP had patched this, though. On some apparently broken hardware we have a rat

Re: [Beowulf] Q: IB message rate & large core counts (per node)?

2010-02-24 Thread HÃ¥kon Bugge
Hi Greg, On Feb 23, 2010, at 23:32 , Greg Lindahl wrote: > A traditional MPI implementation uses N QPs x N processes, so the > global number of QPs is N^2. InfiniPath's pm library for MPI uses a > much smaller endpoint than a QP. Using a ton of QPs does slow down > things (hurts scaling), and th

Re: [Beowulf] Spanning Tree Protocol and latency: allowing loops in switching networks for minimizing switch hops

2010-02-24 Thread Robert Horton
On Tue, 2010-02-23 at 13:23 -0600, Rahul Nabar wrote: > In the interest of latency minimum switch hops make sense and for that > loops might sometimes provide the best solution. Using STP won't give you a latency advantage; it just disables some links in a network with loops so you have a single s