Hi John, Regarding NFS shares and Python, and plenty of other packages too, pay attention to where the NFS server is located on your network. The NFS server should be part of your cluster, or at least have a network interface on your cluster fabric.
If you perhaps have a home directory server which is a campus NFS server and you are NATting via your head node, then every time a parallel multimode job starts up you will pull in libraries multiple times and this will be a real performance bottleneck. The NFS server is part of the cluster (same IP subnet/vlan, I know that in the networking world it is a wrong assumption but the NFS server is physically in the same rack server as the cluster. NFS server / headnode: inet 10.112.0.25 netmask 255.255.255.192 broadcast 10.112.0.63 Execute nodes (1 example):inet 10.112.0.5 netmask 255.255.255.192 broadcast 10.112.0.63 You do have to have a home directory mounted on the nodes - either the users real home directory or something which loosk like a home directory. Ooodles of software packages depend on dor files int eh home directory, and you won't get far without one. Right now each node has a user home directory. Do you suggest that i should move / create users' home directory to the NFS share? Eric, my advice would be to definitely learn the Modules system and implement modules for your users. I definitely have to learn more about Modules system and their implementation. My work takes more into that direction. Also if you could give us some idea of your storage layout this would be good. I hope this what you meant Headnode: eric@radoncmaster:/$ df -h Filesystem Size Used Avail Use% Mounted on udev 7.8G 0 7.8G 0% /dev tmpfs 1.6G 740K 1.6G 1% /run /dev/sda1 902G 3.3G 853G 1% / tmpfs 7.9G 0 7.9G 0% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 7.9G 0 7.9G 0% /sys/fs/cgroup /dev/sdb1 3.6T 572M 3.4T 1% /media/cluster tmpfs 1.6G 0 1.6G 0% /run/user/1000 Execute node (1 example) eric@radonc01:~$ df -h Filesystem Size Used Avail Use% Mounted on udev 32G 0 32G 0% /dev tmpfs 6.3G 984K 6.3G 1% /run /dev/mapper/radonc01--vg-root 2.7T 2.5G 2.5T 1% / tmpfs 32G 0 32G 0% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 32G 0 32G 0% /sys/fs/cgroup /dev/sda2 473M 128M 321M 29% /boot 10.112.0.25:/media/cluster 3.6T 571M 3.4T 1% /nfs/cluster tmpfs 6.3G 0 6.3G 0% /run/user/1000 _____________________________________________________________________________________________________ Eric F. Alemany System Administrator for Research Division of Radiation & Cancer Biology Department of Radiation Oncology Stanford University School of Medicine Stanford, California 94305 Tel:1-650-498-7969<tel:1-650-498-7969> No Texting Fax:1-650-723-7382<tel:1-650-723-7382> On May 11, 2018, at 12:11 AM, John Hearns <[email protected]<mailto:[email protected]>> wrote: Regarding NFS shares and Python, and plenty of other packages too, pay attention to where the NFS server is located on your network. The NFS server should be part of your cluster, or at least have a network interface on your cluster fabric. If you perhaps have a home directory server which is a campus NFS server and you are NATting via your head node, then every time a parallel multimode job starts up you will pull in libraries multiple times and this will be a real performance bottleneck. You do have to have a home directory mounted on the nodes - either the users real home directory or something which loosk like a home directory. Ooodles of software packages depend on dor files int eh home directory, and you won't get far without one. Eric, my advice would be to definitely learn the Modules system and implement modules for your users. Also if you could give us some idea of your storage layout this would be good. On 11 May 2018 at 08:55, Miguel Gutiérrez Páez <[email protected]<mailto:[email protected]>> wrote: Hi, I install all my apps in a shared storage, and change environment variables (path, vars, etc.) with lmod. It's very useful. Regards. El vie., 11 may. 2018 a las 6:19, Eric F. Alemany (<[email protected]<mailto:[email protected]>>) escribió: Hi Lachlan, Thank you for sharing your environment. Everyone has their own set of rules and i appreciate everyone’s input. It seems as if the NFS share is a great place to start. Best, Eric _____________________________________________________________________________________________________ Eric F. Alemany System Administrator for Research Division of Radiation & Cancer Biology Department of Radiation Oncology Stanford University School of Medicine Stanford, California 94305 Tel:1-650-498-7969<tel:1-650-498-7969> No Texting Fax:1-650-723-7382<tel:1-650-723-7382> On May 10, 2018, at 4:23 PM, Lachlan Musicman <[email protected]<mailto:[email protected]>> wrote: On 11 May 2018 at 01:35, Eric F. Alemany <[email protected]<mailto:[email protected]>> wrote: Hi All, I know this might sounds as a very basic question: where in the cluster should I install Python and R? Headnode? Execute nodes ? And is there a particular directory (path) I need to install Python and R. Background: SLURM on Ubuntu 18.04 1 headnode 4 execute nodes NFS shared drive among all nodes. Eric, To echo the others: we have a /binaries nfs share that utilises the standard Environment Modules software so that researchers can manipulate their $PATH on the fly with module load/module unload. That share is mounted on all the nodes. For Python, I use virtualenv's but instead of activating, the path is changed by the Module file. Personally, I find conda doesn't work very well in a shared environment. It's fine on a personal level/ For R, we have resorted to only installing the main point release because we have >700 libraries installed within R and I don't want to reinstall them every time. We do also have packrat installed so researchers can install their own libraries locally as well. Cheers L.
