Hi Amrik:
Amrik Singh wrote:
Hi,
We are running a cluster of 180 diskless compute nodes. 60 of them have
32 bit AMD Semptron processors and rest are dual core AMD Athelon 64
bit processors. 32 bit machines have 10/100 mbps and rest have gigabit
ethernet cards. We have four file servers, each hosting around 3.5TB on
SATA drives connected to 3Ware RAID controller cards configured on RAID
10 array. These file servers are exporting the drives through NFS. Each
file server is running 265 daemons for nfsd.
The file servers are mainly hosting large number of small files ranging
from 256KB to 2 MB. The compute nodes are primarily doing a search
through these files, so there is lot's of reading and some writing to
the file servers.
Recently we started noticing very high (70-90%) wait states on the file
servers when compute nodes. We have tried to optimize the NFS through
increasing the number of daemons and the rsize and wsize but to no avail.
Can someone point us in the right direction as to how we should be
trying to troubleshoot this problem.
You might want to look at the read patterns.
PS: All the nodes are running SuSE 10.0 and servers are running SuSE10.0
and 10.1 and all the drives are formatted with reiserfs.
Hmmm... I remember Reiser has had a problem in the past when file
systems get full or nearly so. There are file tail optimizations you
might want to turn off, as well as use noatime for mounts. I might
suggest turning to a better file system for your servers (if possible,
it might not be a trivial undertaking), but even then that might not be
responsible.
Grab a copy of atop (google for it), run it on your file server. See if
it is the file system that is problematic (disk devices running near 80%
or higher capacity for reads/writes all the time).
Other possibilities are your file access patterns, what the file server
is doing itself, whether or not your networks are being flooded with
small packets (see if your csw is very high, or the number of interrupts
or packets are very high).
Joe
thanks
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 734 786 8452
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf