Re: [Beowulf] IPoIB failure

2015-01-23 Thread Lennart Karlsson
On 01/23/2015 02:39 PM, Bill Wichser wrote: We had a strange event last night. Our IB fabric started demonstrating some odd routing behavior over IB. host A could ping both B and C, yet B and C could not ping one another. This was only at the IP layer. ibping tests all worked fine. A few r

Re: [Beowulf] 10G networking?

2015-01-23 Thread Carsten Aulbert
Hi Mark On 01/23/2015 07:36 AM, Mark Hahn wrote: > It seems like 10gT is on the cusp of real volume-type prices: I saw a > quote today for a major-vendor 24pt switch for something like $140/port > after educational discounts. I know 10gT is somewhat higher latency > than other PHYs - or is it wo

Re: [Beowulf] 10G networking?

2015-01-23 Thread Gavin W. Burris
On 12:41PM Fri 01/23/15 -0500, Joe Landman wrote: > Interesting. I'm taking my ball and going home for today. Thanks for the info. -- Gavin W. Burris Senior Project Leader for Research Computing The Wharton School University of Pennsylvania ___ Beowu

Re: [Beowulf] 10G networking?

2015-01-23 Thread Joe Landman
On 01/23/2015 12:14 PM, Gavin W. Burris wrote: On 12:04PM Fri 01/23/15 -0500, Joe Landman wrote: Don't use ping for HPC latency/throughput tests. What is your preferred benchmark, Joe? Interesting. Ping isn't a benchmark. Its a very basic test of connectivity, packet transfer and routin

Re: [Beowulf] 10G networking?

2015-01-23 Thread Gavin W. Burris
On 12:04PM Fri 01/23/15 -0500, Joe Landman wrote: > Don't use ping for HPC latency/throughput tests. What is your preferred benchmark, Joe? -- Gavin W. Burris Senior Project Leader for Research Computing The Wharton School University of Pennsylvania _

Re: [Beowulf] 10G networking?

2015-01-23 Thread Joe Landman
On 01/23/2015 12:03 PM, Gavin W. Burris wrote: min packet ping-pong latency is the standard metric, though packets/sec rate might be interesting as well. # ping -q -s 1 -f hpcc001-i -c 100 100 packets transmitted, 100 received, 0% packet loss, time 47132ms ipg/ewma 0.047

Re: [Beowulf] 10G networking?

2015-01-23 Thread Gavin W. Burris
> min packet ping-pong latency is the standard metric, though packets/sec > rate might be interesting as well. # ping -q -s 1 -f hpcc001-i -c 100 100 packets transmitted, 100 received, 0% packet loss, time 47132ms ipg/ewma 0.047/0.000 ms 100/47132 = 21.217 packets/ms Go

Re: [Beowulf] 10G networking?

2015-01-23 Thread Gavin W. Burris
Hi, Mark. Here are some tests, Let me know if you'd like something specific. # lspci |grep Ether 01:00.0 Ethernet controller: Intel Corporation 82599 10 Gigabit Dual Port Backplane Connection (rev 01) 01:00.1 Ethernet controller: Intel Corporation 82599 10 Gigabit Dual Port Backplane C

Re: [Beowulf] 10G networking?

2015-01-23 Thread Douglas Eadline
Mark, I have been pleasantly surprised by the performance of Chelsio. My tests are very small scale (no switch) and the hardware I'm testing is for my Limulus desk side cluster. I ran netpipe TCP and got latencies of 9us back-to-back (SFP+) Hopping through a two port adapter I got 18 us. I the

Re: [Beowulf] 10G networking?

2015-01-23 Thread Mark Hahn
We have been using 10g CX4 for the past 7yrs for our cluster core network and are just going fully into 10gT. But our focus is more on bandwidth/HTC instead of HPC, i.e. we do not care so much about latencies. I'm sorry if I was unclear: I'm interested specifically in whether recent development

Re: [Beowulf] 10G networking?

2015-01-23 Thread Joe Landman
On 01/23/2015 01:36 AM, Mark Hahn wrote: Hi all, I'd appreciate any comments about the state of 10G as a reasonable cluster network. Have you done any recent work on 10G performance? https://lwn.net/Articles/629155/ shows some excellent evidence-based work on kernel paths, but it seems focus

Re: [Beowulf] IPoIB failure

2015-01-23 Thread Bill Wichser
Thanks for the suggestion. Yes, we have been through that. Nothing new though not clean either. But the issues complained about in the file have been there for some time so I don't believe this file sheds any light. Of course watching this as the switch infrastructure returned was quite noi

Re: [Beowulf] IPoIB failure

2015-01-23 Thread John Hearns
Do you see anything interesting in the opensm logs on that server? In the past I have found looking through opensm logs to be tough going though - generally full fo verbose messages which don't mean a lot. Maybe if you could track down the time when the problem first started and look in the ope

[Beowulf] IPoIB failure

2015-01-23 Thread Bill Wichser
We had a strange event last night. Our IB fabric started demonstrating some odd routing behavior over IB. host A could ping both B and C, yet B and C could not ping one another. This was only at the IP layer. ibping tests all worked fine. A few runs of ibdiagnet produced all the switches a

Re: [Beowulf] 10G networking?

2015-01-23 Thread Gavin W. Burris
Hi, Mark. We just launched a new cluster, all with 10gigE. What kind of numbers are you looking for? Most of our workload is data analysis or iterations of a simulation with commercial software, though we do validate and support Open MPI. We just opened the doors and are still working the kinks

Re: [Beowulf] 10G networking?

2015-01-23 Thread John Hearns
It occurred to me the other day that it's about time we had something better than 1GE for commodity networking. It's good news that switch costs are coming down but I've yet to see a server with an onboard 10gT adaptor (although I have seen some with SFP+ 10g). Supermicro systems are available

Re: [Beowulf] 10G networking?

2015-01-23 Thread Robert Horton
On Fri, 2015-01-23 at 01:36 -0500, Mark Hahn wrote: > Hi all, > I'd appreciate any comments about the state of 10G as a reasonable > cluster network. Have you done any recent work on 10G performance? > > https://lwn.net/Articles/629155/ I had a go at using RoCE with some Mellanox NICs a year or