On 10/28/2018 09:33 AM, Jörg Saßmannshausen wrote:
Hi Prentice,

that sounds somehow similar of what I have done back at my days at UCL:
- login node with development packages
- compute node only what is really needed in terms of software and services

However, if you are removing packages from the compx.xml file manually, how can
you be sure you are not breaking dependencies?

Two ways:

1. If you omit a package that is a dependency but don't explicitly say not to install it in your kickstart file, then anaconda should install it when it resolves dependencies. Whenever I go through this process, there's always more RPMs installed on the final system than I list, which is a result of this dependency resolution process.

2. Testing, testing, testing!  If you create a situation where a dependency cannot be resolved, the kickstart will fail with an error, which you should see when testing your kickstart process.

As I was using Debian, I simply did bare installation and then installed what
I needed. Once I got a running OS, I rsynced that to a folder on the headnode
which was the 'image' of the compute nodes. Depending on the cluster, I build
the software on the login node and copied it to the folder where the image
was. So during installation, that image folder was copied to the compute nodes
and I only had to install the boot-loader (I never really looking into how to
script that as well) using PXE boot. It worked quite well.
Upgrading a software package simply means installing it inside the image
foldeer (either chroot if it was a .deb package or just copie the files over)
and rsynced it to the compute nodes.


It was a robust system and I managed to handle th 112 compute nodes quite well
I had. I even could take care of older and newer nodes and can install highly
optimised packages on them as well. So nodes which only got avx got only the
avx enabled software and the ones which had avx2 got these ones.

It might not be the most flashy solution but it was KIS: Kepp It Simple!

All the best from a rainy London

Jörg

Am Dienstag, 23. Oktober 2018, 13:43:43 GMT schrieb Prentice Bisbal via
Beowulf:
Ryan,

When I was at IAS, I pared down what was on the compute nodes
tremendously. I went through the comps.xml file practically line-by-line
and reduced the number of packages installed on the compute nodes to
only about 500 RPMs. I can't remember all the details, but I remember
omitting the following groups of packages:

1. Anything related to desktop environments, graphics, etc.
2. -devel packages
3. Any RPMS for wireless or bluetooth support.
4. Any kind of service that wasn't strictly needed by the compute nodes.

In this case, the user's desktops mounted the same home and project
directories and shared application directory (/usr/local), so the user's
had all the the GUI, post-processing, and devel packages they needed
right on their desktop, so the cluster was used purely for running
non-interactive batch jobs. In fact, there was no way for a user to even
get an interactive session on the cluster.  IAS  was a small environment
where I had complete control over the desktops and the cluster, so I was
able to this. I would do it all again just like that, given as similar
environment.

I'm currently managing a cluster with PU, and PU only puts the -devel
packages, etc. on the the login nodes so users can compile there apps
there.

So yes, this is still being done.

There are definitely benefits to providing specialized packages lists
like this:

1. On the IAS cluster, a kickstart installation, including configuration
with the post-install script, was very quick - I think it was 5 minutes
at most.
2. You generally want as few services running on your compute nodes as
possible. The easiest way to keep services from running on your cluster
nodes is to not install those services in the first place.
3. Less software installed = smaller attack surface for security exploits.

Does this mean you are moving away from Warewulf, or are you creating
different Warewulf images for login vs. compute nodes?


Prentice

On 10/23/2018 12:15 PM, Ryan Novosielski wrote:
Hi there,

I realize this may not apply to all cluster setups, but I’m curious what
other sites do with regard to software (specifically distribution
packages, not a shared software tree that might be remote mounted) for
their login nodes vs. their compute nodes. From what I knew/conventional
wisdom, sites generally place pared down node images on compute nodes,
only containing the runtime. I’m curious to see if that’s still true, or
if there are people doing something else entirely, etc.

Thanks.

--
____

|| \\UTGERS,     |---------------------------
*O*---------------------------
||
||_// the State  |         Ryan Novosielski - novos...@rutgers.edu
||
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS
|| Campus
||
||  \\    of NJ  | Office of Advanced Research Computing - MSB C630,
||  Newark
||
       `'

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to