Hi folks,

So per advices and suggestions, we started to look for booting our nodes throught Gbit Ethernet. The OS of our choice is Scientific Linux 6.3 - SL6.3 (for all master and client nodes). There are bunches of guides/instructions out there in the net, but I focused and learnt from mainly two guides:

https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/diskless-nfs-config.html
http://www.linuxquestions.org/questions/red-hat-31/building-a-diskless-redhat-enterprise-linux-cluster-765393/

After few days struggling with the system, here are what I have done:
 * install SL6.3 on master node
 * install DHCP server (using dhcpd) on master node
 * install xinetd and enable tftp
 * open firewall for tftp and dhcpd using iptables

The aboves were enough for me to boot up SL6.3 LiveCD on a client node using PXE. The liveCD boots fine, I was able to get into the desktop, but was unable to proceed next :(. Cant install because these are diskless nodes.

What I have done next:
 * install/enable nfs server
 * open firewall (iptables) for nfs services

Then booting SL6.3 LiveCD, i still cannot see nfs mount point to install the system. Then next trial was rsync. First rsync was for the current system on master node (with a lot of different services such as dhpcd, nfs, xinetd, tftp)

$ rsync -a -e ssh --exclude='/proc/*' --exclude='/sys/*' localhost:/ /diskless/hostroot

where hostroot is exported through nfs server:

$ cat /etc/exports
/diskless *(rw,sync,no_root_squash)

After editing /diskless/hostroot/etc/fstab as instructed:

$ cat /diskless/hostroot/etc/fstab
none            /tmp            tmpfs    defaults    0 0
none                   /dev/shm                tmpfs defaults        0 0
none                  /dev/pts                devpts gid=5,mode=620  0 0
sysfs                   /sys                    sysfs defaults        0 0
proc                    /proc                   proc defaults        0 0

Finally I have in tftp server:

$ ls -l /var/lib/tftpboot/
total 781140
-rw-r--r--. 1 root root 32149978 Nov 16 17:07 initramfs-2.6.32-279.14.1.el6.x86_64.img
-rw-r--r--. 1 root root 730839030 Nov 14 16:22 initrd0.img
-rw-r--r--. 1 root root     26828 Nov 14 16:22 pxelinux.0
drwxr-xr-x. 2 root root      4096 Nov 19 14:40 pxelinux.cfg
-r--r--r--. 1 root root   3987376 Nov 14 16:22 vmlinuz0
-rwxr-xr-x. 1 root root 3989680 Nov 15 23:22 vmlinuz-2.6.32-279.14.1.el6.x86_64

Ok, booting this system, I was able to see desktop client on the node, but can't log in (actually, I was able to log in and was kicked out right after that). ssh to the client node got the same thing: in and being kicked out. Dont know what was wrong :(.

OK, next I tried not to rsync the current master system, but tried to install using groupinstall:

$ yum -y groupinstall "Base" "Server Platform" --installroot=/diskless/root

but then I got a bunch of errors with dependencies. Asking SL forum/mailing list with the above errors but I have not gotten any good solution yet.

So finally I tried to put a USB stick on the client node, booted up LiveCD again, installed the new system on client node on the usb stick, and then rsync using this system instead of the master node's sytem:

$ rsync -a -e ssh --exclude='/proc/*' --exclude='/sys/*' 192.168.200.2:/ /diskless/clientroot

Unfortunately this system could not boot up. I got stuck at something like

INFQ: task flush-0:18:1924 blocked for more than 120 seconds.

So to summarize:
 * boot using liveCD -> OK, logging in fine
 * boot using rsync of master node's system -> OK, cant log in
 * boot using rsync of client node's sytem -> cant boot
 * install client node using groupinstall -> cant do

So, what should I do next? Please advise,

Thanks,

D.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to