Re: [Beowulf] Help with inconsistent network performance

Joe Landman Tue, 18 Dec 2007 15:51:12 -0800

As has been pointed out to me offline, my numbers may be a bit morepessimistic than needed, in part to pipelining and other effects. If mynumbers were the result of a correct analysis, the most you would beable to see from a gigabit link would be about 37 MB/s for 1500 bytepackets. This is obviously not the case, so assume this to be a "worstcase" analysis (and I am going to go back and review what I seem tohave dropped from the TCP bits).

Joe


Joe Landman wrote:

Hi Brendan:

Brendan Moloney wrote:
I have a cluster of 8 Linux machines connected with gigabit
ethernet (full duplex) to a HP Procurve 2848 switch.   I am using the
machines to do interactive distributed rendering. I have noticed thatthefinal gather stage (where the intermediate images from the rendernodes are
sent back to the viewing node) has "hiccups" in the performance.  These
How are they sent?  NFS? Sockets? ...
hiccups occur with as few as two render nodes, and become more commonas Iadd more render nodes. With a 512x512 image the final gather usuallytakes
a few milliseconds for each frame, but when the hiccups occur it is more
like 200+ milliseconds.
Is this "real time" rendering so that frame rate isthe most importantaspect?
Since it is a full duplex switched network, there should not be any
collisions happening.  Since the image is less than 1 MB total, I don't
There could be blocking ... if one unit grabs the single network pipeof the display node while the another node tries to send data, then thelate node will back off (well with TCP it will) in a pre-determined manner.
think I am saturating the switch.  I have checked the contents of
/sbin/ifconfig and there are zero erroneous packets being reported.At this
You wouldn't see it there. It would be on the switch, and even then itwouldn't term it a collision. It is a switch behaving normally.
point I am really at a loss as to what is causing this. Any input onthings
to check would be greatly appreciated.
I assume you have a single gigabit from the display node to the switch.As you scale up the number of render nodes, you notice more of these"hiccups" scaling about linearly with the number of nodes.
This suggests resource contention. Each image would be fragmented intounits of 175 1500-byte packets. This assumes 8 bit images. If you areusing 8 bits per color, 3 colors and an alpha channel, then this is ~700packets. Each 1500 byte packet takes about 11us to transmit, and has anon-trivial latency associated with it. I will estimate the latency at30us (this is switch latency of ~ 5us + network stack latency on eachside of about 12.5us). So for each packet, you have about 41us totransfer it. If you have 8 bit images, then this corresponds to 7.2ms. There may be some other caching effects that I am missing, ormis-computed. For 32 bits (3x 8bit color channels + 1 alpha channel),this is looking like 28.8 ms for each image. Best case you could dowith this is about 34.7 frames per second.
If on the other hand, you used jumbo frames with 9000 byte packets, youwould need 30 to transfer each image, which would require 67.1us tomove, and still 30 us of latency, for 97.1us per packet. For 30packets, this is 2.9ms. For the 32 bit version as indicated previously(3x 8 bit color channels, and one alpha channel) this would be about11.6ms. Or 85.9 frames per second.
Based on this, I would suggest seeing if changing mtu to 9000 helps.

    ifconfig eth0 mtu 9000

on all your nodes (every one).
The argument for this is that you have less latency to pay for, eventhough it takes longer to transfer the payload.
Another possibility is channel bonding on your display node.



--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web  : http://www.scalableinformatics.com
       http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Help with inconsistent network performance

Reply via email to