Rahul,
Rahul Nabar wrote:
I have seen a considerable performance boost for my codes by using
Jumbo Frames. But are there any systematic tools or strategies to
select the optimum MTU size?
There is no optimal MTU size. This is the maximum payload you can fit in
one packet, so there is no drawback to a bigger MTU. Actually, there is
one in terms of wormhole switching, but switch contention is an issue
happily ignored by most HPC users.
external world required of the interfaces) Have you guys found
performance to be MTU sensitive?
A large MTU means fewer packets for the same amount of data transfered.
In all stack processing, there is a per-packet overhead (decoding
header, integrity, sequence number, etc) and a per-byte overhead (copy).
A large MTU reduces the total per-packet overhead because there are less
packets to process.
Most 10GE NIC have no problems reaching line rate at 1500 Bytes (the
standard Ethernet MTU), the problem is the host OS stack (mainly TCP)
where the per-packet overhead is important. One trick that all 10GE NICs
worth their salt are doing these days is to fake a large MTU at the OS
level, while keeping the wire MTU at 1500 Bytes (for compatibility).
This is called TSO (Transmit Send Offload) and LRO (Large Receive
Offload). The OS stack is using a virtual MTU of 64K and the NIC does
segmentation/reassembly in hardware, sort of.
Also, are there any switch side parameters that can affect the
performance of HPC codes? Specifically I was trying to run VASP which
is known to be latency sensitive.
A large MTU has little to no impact on latency.
I have a 10 Gig E network with a
RDMA offload card and am getting average latencies (ping pong) using
rping of around 14 microsecs in the MPI tests.
It is most likely due to the switch. Try back-to-back to measure without
it. I don't know what hardware you are using, but you can get close to
10us latency over TCP with a standard 10GE NIC and interrupt coalescing
disabled. With a NIC supporting OS-bypass (RDMA only make sense for
bandwidth), you should get at least half that, ideally below 3us.
Patrick
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf