> >> Since we have some users that need > >> shared memory but also we want to build a normal cluster for mpi > >> apps, we think that this could be a solution. Let's say about > >> 8 machines (96 processors) pus infiniband. Does it sound correct? > >> I'm aware of the bottleneck that means having one ib interface for > >> the mpi cores, is there any possibility of bonding? > > > Bonding (or multi-rail) does not make sense with "standard > IB" in PCIe > > x8 since the PCIe connection limits the transfer rate of a single > > IB-Link already. > > PCIe x8 Gen2 provides additional bandwidth as Gilad said. On > Opteron systems that is not available yet (and won't be for > some time), so you may want to search for AMD-CPU or > Intel-CPU based boards that have PCIe > x16 slots. >
One more useful info, is that there are couple of installation in Japan where they use 4 "regular IB DDR" adapters in 4 PCIe x8 slots to provide 6GB/s (1500MB per slot) and they do bonding to have it as a single pipe. If you plan to use Intel you can use PCIe Gen2 with IB QDR and get 3200MB per PCIe Gen2 slot. > > My hint would be to go for Infinipath from QLogic or the > new ConnectX > from Mellanox since message rate is probably your limiting > factor and those technologies have a huge advantage over > standard Infiniband SDR/DDR. > > I agree that message rate may be your limiting factor. > Results with QLogic (aka InfiniPath) DDR adapters: > > DDR Peak MPI Bandwidth Peak Message Rate > Adapter (no message coalescing**) > QLE7280 PCIe x16 1950 MB/s 20-26* > Million/sec (8 ppn) > QLE7240 PCIe x8 1500 MB/s 19 > Million/sec (8 ppn) > > Test details: All run on two nodes, each with 2x Intel Xeon > 5410 (Harpertown, quad-core, 2.33 GHz CPUs), 8 cores per > node, SLES 10. > except, > * 26 M messages/sec requires faster CPUs, 3 to 3.2 Ghz. > > 8 ppn means 8 MPI processes per node. The non-coalesced > message rate performance of these adapters scales pretty > linearly from 1 to 8 cores. > That is not the case with all modern DDR adapters. > As Tom wrote, the message rate depends on the number of CPUs. With the benchmark Tom indicated below and the same CPU, you can get up to 42M msg/sec with ConnectX. > Benchmark = OSU Multiple Bandwidth, Message Rate benchmark, > osu_mbw_mr.c The above performace results can be had with > either MVAPICH 1.0 or QLogic MPI 2.2 (other MPIs are in the > same ballpark with these adapters). > > Note that MVAPICH 0.9.9 had meassage-coalescing on by > default, and MVAPICH 1.0 has it off by default. There must > be a reason. As far as I know, the reason for that was to have the user pick his choice. As OSU mentioned, there are some applications when this helps and some that it does not. Gilad. > > Revisiting: > > > > Bonding (or multi-rail) does not make sense with "standard > IB" in PCIe > > x8 since the PCIe connection limits the transfer rate of a single > > IB-Link already. > > Some 4-socket motherboards have independent PCIe buses to x8 > or x16 slots. In this case, multi-rail does make sense. You > can run the QLogic adapters as dual-rail without bonding. On > MPI applications, half > of the cores will use one adapter and half will use the > other. Whether > the more expensive dual-rail arrangement is necessary and/or > cost-effective would be very application-specific. > > Regards, > -Tom Elken > > > > > > > > > > > Infinipath and ConnectX are available as DDR Infiniband and > provide a > > bandwidth of more than 1800 MB/s > > > > Good suggestion. > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org To change your > subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf