Well it turns out that flow control was disabled on the switch, and once we
enabled it the hiccups disappeared and the average RTT was cut in half.
Even with an image size of 1920x1200 and 7 nodes sending to one, the RTTs
are the same as if there is one node sending the full image.
Thanks a lot f
Brendan,
I'm a day late but maybe not a dollar short :-) When I read the original
question, I was going to ask, "do the compute (render) nodes push their
results when ready, or does the head (view) node pull?" and from the
subsequent discussion and clarifications it seems to be the former. And yea
One consideration is the size of the messages being exchanged. Even
today, small packets can markedly reduce switch performance. RFC 2544
compliance is not universal in the Layer 2 world.
gerry
Jon Forrest wrote:
Brendan Moloney wrote:
Since it is a full duplex switched network, there shoul
the first message should take <50 us. the broadcast to 5 nodes should
take 2-3 more 50 us times. so at about 200 us, all the slaves will start
the DOS attack on the viewer node's nic...
I am not sure why you compare this to a DOS attack. The same amount of data
(and roughly the same amount of
Greg Lindahl wrote:
On Tue, Dec 18, 2007 at 09:05:41PM -0500, Patrick Geoffray wrote:
No, it just means the NIC supports it.
Well, then how about ethtool -S? That looks like an actual count of
flow control events, so rx flow control events means the switch
must support it in some fashion.
I
> the first message should take <50 us. the broadcast to 5 nodes should
> take 2-3 more 50 us times. so at about 200 us, all the slaves will start
> the DOS attack on the viewer node's nic...
>
I am not sure why you compare this to a DOS attack. The same amount of data
(and roughly the same amo
I guess I figured that the data is relatively small compared to the
bandwidth,
I agree, in principle. and relatively small compared to the amount of ram
in the switch as well.
whereas the latency for ethernet is relatively high. I also
not _that_ high, though. with a little tuning (coales
On Tue, Dec 18, 2007 at 09:05:41PM -0500, Patrick Geoffray wrote:
> No, it just means the NIC supports it.
Well, then how about ethtool -S? That looks like an actual count of
flow control events, so rx flow control events means the switch
must support it in some fashion.
> For RX hardware flow-c
On 12/18/07, Mark Hahn <[EMAIL PROTECTED] > wrote:
>
> > The machines are running the 2.6 kernel and I have confirmed that the
> max
> > TCP send/recv buffer sizes are 4MB (more than enough to store the full
> > 512x512 image).
>
> the bandwidth-delay product in a lan is low enough to not need
> th
The machines are running the 2.6 kernel and I have confirmed that the max
TCP send/recv buffer sizes are 4MB (more than enough to store the full
512x512 image).
the bandwidth-delay product in a lan is low enough to not need
this kind of tuning.
I loop with the client side program sending a s
Ok guys, thanks for all the feedback.
I guess I should have provided some more specific details. I am using
sockets with TCP/IP for the final gather stage. I am doing real-time
(volume) rendering. The images are 32-bit (RGBA with 8 bits per channel).
The machines are running the 2.6 kernel and
Hi Greg,
Greg Lindahl wrote:
ethtool -a eth0
and it says RX/TX pause are on, doesn't that mean that the switch
supports it?
No, it just means the NIC supports it. RX means that the NIC will send
PAUSE packets if the host does not consume fast enough (rare) and TX
means that the NIC will sto
On Tue, Dec 18, 2007 at 06:21:35PM -0500, Patrick Geoffray wrote:
> I don't know about the hardware flow-control implementation in the
> Procurve 2848, and it may just be off by default like most Ethernet
> switches. FWIW, there was no working hardware flow-control on the 10GigE
> Procurve swit
As has been pointed out to me offline, my numbers may be a bit more
pessimistic than needed, in part to pipelining and other effects. If my
numbers were the result of a correct analysis, the most you would be
able to see from a gigabit link would be about 37 MB/s for 1500 byte
packets. This i
Hi Joe, Brendan
Joe Landman wrote:
Since it is a full duplex switched network, there should not be any
collisions happening. Since the image is less than 1 MB total, I don't
There could be blocking ... if one unit grabs the single network pipe
of the display node while the another node trie
Brendan,
If you are doing this via nfs, you should be sure that mounts are
done using the tcp parameter in /etc/fstab. Otherwise you may get
udp, and I have seen problems with that as recently as Fedora 8 this morning!
Mike
At 08:03 PM 12/17/2007, Brendan Moloney wrote:
I have a cluster of
Hi Brendan:
Brendan Moloney wrote:
I have a cluster of 8 Linux machines connected with gigabit
ethernet (full duplex) to a HP Procurve 2848 switch. I am using the
machines to do interactive distributed rendering. I have noticed that the
final gather stage (where the intermediate images from t
Brendan Moloney wrote:
Since it is a full duplex switched network, there should not be any
collisions happening.
I have a similar situation with a slightly larger cluster.
At first I also thought it was a network performance
problem. But then I ran the iftop program to watch
the network in rea
final gather stage (where the intermediate images from the render nodes are
sent back to the viewing node) has "hiccups" in the performance. These
as perceived how? do you mean your gather/gui machine pauses? could it
be as simple as allocating memory? (if you do a significant memory
alloc
Brendan Moloney wrote:
Any input on things to check would be greatly appreciated.
It might be useful to run some tools like dstat, top/htop, vmstat and
iostat while performing the rendering and summarise any behaviour which
co-incides with the hiccups.
Have you ganglia on your cluster? You
I have a cluster of 8 Linux machines connected with gigabit
ethernet (full duplex) to a HP Procurve 2848 switch. I am using the
machines to do interactive distributed rendering. I have noticed that the
final gather stage (where the intermediate images from the render nodes are
sent back to the v
21 matches
Mail list logo