Thanks Robert, You have given me a lot to think about. Most of our nodes have around 250GB SSDs largely unpopulated so I am guessing there is no harm in just installing the libraries in every node with ansible. Also, in our department we have a wealth of old HDDs we could repurpose My time indeed has a cost, hence I will favour a "cheap and dirty" solution to get the ball rolling and try something fancy later. Though I was intrigued by your tip about LXC, I have used LXC locally on my workstation for the longest time, but I have not considered running it on a Beowulf cluster context, that would be a neat thing to investigate.
anyway thanks for the tips Cheers On Tue, 28 Jun 2022 at 16:01, Robert G. Brown <r...@phy.duke.edu> wrote: > On Tue, 28 Jun 2022, leo camilo wrote: > > > I see, so if I understand it correctly I have to make sure that there is > a > > copy of the library, environments and modules on every computational > node? > > > > I am wondering if I can get around it by using nfs. > > The answer is yes, although it is a bit of a pain. > > Two ways to proceed: > > Export the library directory(s) from your head node -- at least /usr/lib > (this assumes, BTW, that the head node and worker nodes are running > exactly the same version of linux updated to exactly the same level -- > especially the kernel). Mount it on an alternative path e.g. > /usr/local/lib or /usr/work/lib e.g. during/after boot. Learn how to > use ldconfig and run it to teach the kernel how to find the libraries > there. This approach is simple in that you don't need to worry about > whether or not any particular library is there or isn't there -- you are > provisioning "everything" present on your head node, so if it works one > place it works everywhere else. > > The second way may be easier if you are already exporting e.g. a home > directory or work directory, and only need to provision a few > applications. Use Unix tools (specifically ldd) to figure out what > libraries are needed for your application. Put copies of those > libraries in a "personal" link library directory -- e.g. > /home/joeuser/lib -- and again, use ldconfig as part of your startup/login > script(s) to teach the kernel where to find them when you run your > application. > > A third way is to look into containers -- https://linuxcontainers.org/ > -- which allow you to build "containerized" binaries that contain all of > their dependencies and in principle run across DIFFERENT linuces, > kernels, update levels, etc. The idea there is a containerized app > doesn't depend directly on the parent operating system "at all" beyond > running on the right CPU. An immediate advantage is that if somebody > decides to change or drop some key library in the future, you don't > care. It's containerized. I have only played with them a bit, mind > you, but they are supposedly pretty robust and suitable for commercial > cloud apps etc so they should be fine for you too. > > A fourth way -- and this would be my own preference -- would be to just > install the requisite libraries on the worker nodes (all of which should > be automagically updated from the primary repo streams anyway to remain > consistent and up to date). Hard storage is sooooo cheap. You could > put the entire functional part of the OS including all libraries on > every system for $10 to $20 via a USB thumb drive, assuming that the > worker nodes don't ALREADY have enormous amounts of unused space. Speed > is not likely to be a major issue here as the OS will cache the > libraries after the initial load assuming that your nodes are > well-provisioned with RAM, and it has to load the application itself > anyway. I can't think of a good reason any more -- with TB hard drives > very nearly the SMALLEST ones readily available to limit what you put on > a worker node unless you are trying to run it entirely diskless (and for > the cost, why would you want to do that?). > > Remember, YOUR TIME has a cost. You have 7 worker nodes. Putting a 128 > GB hard drive on the USB port of each will cost you (say) $15 each, for > a total of $105 -- assuming that somehow the nodes currently have only 8 > GB and can't easily hold the missing libraries "permanently". I did > beowulfery back in the day when storage WAS expensive, and run entirely > diskless nodes in many cases that booted from the network, and I assure > you, it is a pain in the ass and pointless when storage is less than > $0.10/GB. There is simply no point in installing "limited" worker > nodes, picking and choosing what libraries to include or trying to > assemble and OS image that lacks e.g. GUI support just because you won't > be putting a monitor and keyboard on them. Just come up with a standard > post-install script to run after you do the primary OS install to e.g. > "dnf -y install gsl" to add in the Gnu scientific library or whatever > and ensure that the nodes are all updated at the same time for > consistency, then forget it. > > rgb > > > > > On Tue, 28 Jun 2022 at 11:42, Richard <e...@trick-1.net> wrote: > > For what it?s worth I use an easy8 licensed bright cluster (now > > part of NVidia) and I continually find I need to make sure the > > module packages, environment variables etc are installed/set in > > the images that are deployed to the nodes > > > > Bright supports slurm, k8, jupyter and a lot more > > > > Richard > > > > Sent from my iPhone > > > > > On 28 Jun 2022, at 19:32, leo camilo <lhcam...@gmail.com> > > wrote: > > > > > > > > > # Background > > > > > > So, I am building this small beowulf cluster for my > > department. I have it running on ubuntu servers, a front node > > and at the moment 7 x 16 core nodes. I have installed SLURM as > > the scheduler and I have been procrastinating to setup > > environment modules. > > > > > > In any case, I ran in this particular scenario where I was > > trying to schedule a few jobs in slurm, but for some reason > > slurm would not find this library (libgsl). But it was in fact > > installed in the frontnode, I checked the path with ldd and I > > even exported the LD_LIBRARY_PATH . > > > > > > Oddly, if I ran the application directly in the frontnode, it > > would work fine., > > > > > > Though it occured to me that the computational nodes might not > > have this library and surely once I installed this library in > > the nodes the problem went away. > > > > > > # Question: > > > > > > So here is the question, is there a way to cache the > > frontnode's libraries and environment onto the computational > > nodes when a slurm job is created? > > > > > > Will environment modules do that? If so, how? > > > > > > Thanks in advance, > > > > > > Cheers > > > _______________________________________________ > > > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin > > Computing > > > To change your subscription (digest mode or unsubscribe) visit > > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > > > > > > > > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:r...@phy.duke.edu > > >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf