Jon Forrest wrote: > Bill Broadley wrote: > >> My first suggest sanity test would be to test latency and bandwidth to >> insure >> you are getting IB numbers. So 80-100MB/sec and 30-60us for a small >> packet >> would imply GigE. 6-8 times the bandwidth certainly would imply SDR or >> better. Latency varies quite a bit among implementation, I'd try to get >> within 30-40% of advertised latency numbers. > > For those of us who aren't familiar with IB utilities, > could you give some examples of the commands you'd use > to do this? > > Thanks, > Jon
Here's 2 that I use: http://cse.ucdavis.edu/bill/relay.c http://cse.ucdavis.edu/bill/mpi_nxnlatbw.c So to compile, assuming a sane environment: mpicc -O3 relay.c -o relay The command to run an MPI program varies by environment and mpi implementation, and batch queue environment (especially tight integration). It should be something close to: mpirun -np <number of nodes> -machinefile <list of nodes> ./relay 1 mpirun -np <number of nodes> -machinefile <list of nodes> ./relay 1024 mpirun -np <number of nodes> -machinefile <list of nodes> ./relay 8192 You should see something like: c0-8 c0-22 size= 1, 16384 hops, 2 nodes in 0.75 sec ( 45.97 us/hop) 85 KB/sec c0-8 c0-22 size= 1024, 16384 hops, 2 nodes in 2.00 sec (121.94 us/hop) 32803 KB/sec c0-8 c0-22 size= 8192, 16384 hops, 2 nodes in 6.21 sec (379.05 us/hop) 84421 KB/sec So basically on a tiny packet 45us of latency (normal for gigE), and on a large package 84MB/sec or so (normal for GigE). I'd start with 2 nodes, then if you are happy try it with all nodes. Now for infiniband you should see something like: c0-5 c0-4 size= 1, 16384 hops, 2 nodes in 0.03 sec ( 1.72 us/hop) 2274 KB/sec c0-5 c0-4 size= 1024, 16384 hops, 2 nodes in 0.16 sec ( 9.92 us/hop) 403324 KB/sec c0-5 c0-4 size= 8192, 16384 hops, 2 nodes in 0.50 sec ( 30.34 us/hop) 1054606 KB/sec Note the latency is some 25 times less and the bandwidth some 10+ times higher. Note the hostnames are different, don't run multiple copies on the same node unless you intend to. Running 4 copies on a 4 cpu node doesn't test infiniband. So once you get what you expect I'd suggest something a bit more comprehensive. Something like: mpirun -np <number of nodes> -machinefile <list of nodes> ./mpi_nxnlatbw I'd expect some different in latency and bandwidth between nodes, but not any big differences. Something like: [0<->1] 1.85us 1398.825264 (MillionBytes/sec) [0<->2] 1.75us 1300.812337 (MillionBytes/sec) [0<->3] 1.76us 1396.205242 (MillionBytes/sec) [0<->4] 1.68us 1398.647324 (MillionBytes/sec) [1<->0] 1.82us 1375.550155 (MillionBytes/sec) [1<->2] 1.69us 1397.936020 (MillionBytes/sec) ... Once those numbers are consistent and where you expect them (both latency and bandwidth) I'd follow up with a production code that produces a known answer and is likely to provide much wider MPI coverage. _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
