Hi Rahul, I implemented a custom NFS solution based on Gentoo Linux for a cluster some time ago, which has been going fine until now but it is a very small cluster. It's an 8 node machine which will be upgraded to 16 this year. Still, some codes used there write a lot to disk so the NFS link could be easily saturated without much effort. There are no funds for 10GbE, FC or Infiniband, so I decided to do NFS + local disk for all compute nodes. This was done mainly to have a way to keep all compute nodes updated with little effort and not to increase I/O performance. It's also easier to maintain than Lustre, so I went for it. It goes something like this:
- NFS server is the entry node and has RAID 1. It stores the base install which is not bootable and a small copy of installation files that must be writable(/etc and /var) for each node, with the rest being bind mounted. I export those directories to the nodes. - Nodes boot a kernel image by PXE and mount exported filesystem as / and then write some files(not much data) to /etc and /var at boot, the rest is read only with the exception of /tmp and /home (also some swap for safety reasons) which are running on a single SAS disk on the node. Typically scratch files run either on /home and /tmp is there to keep the pressure of the single link to the NFS server. I have dedicated a single GbE port on each blade to serve/access the NFS shares, leaving the other one for MPI, which we aren't using either way because it's too slow for the codes run there. - All configurations and user management are done in the base install which are then rsync'd to all other installations /etc and /var, which is a fast procedure by now and that can run on-the-fly without problems for the compute nodes. Backup is also easy, it's just a backup of the base install which is always in an "unbootable" state, with no redundant files. So far it has been working great and scaling nodes is very easy. I would say something like this is feasible for 300 nodes due to the lack of pressure put on the network. They only basically go load the executable file and shared libraries at the start of a job and that's it. I can provide the scrips I have set up to do this if you want to take a look at them. Best regards, Tiago Marques On Thu, Sep 24, 2009 at 10:54 PM, Rahul Nabar <rpna...@gmail.com> wrote: > On Thu, Sep 10, 2009 at 11:18 AM, Joe Landman > <land...@scalableinformatics.com> wrote: > >> >> r...@dv4:~# mpirun -np 4 ./io-bm.exe -n 32 -f /data2/test/file -r -d -v > > In order to see how good (or bad) my current operating point is I was > trying to replicate your test. But what is "io-bm.exe"? Is that some > proprietary code or could I have it to run a similar test? > > -- > Rahul > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf