Hi,

we tried it out it on Google Cloud with GPU nodes running on another provider 
through site-to-site VPN. The database was on a managed GCloud instance.
There are indeed points that you need to consider:
- Microservice: the maximalist dream "1 process = 1 container" is not possible 
for slurmctld/dbd if you need munge - you'll need an entrypoint script which first starts 
munge and then the server component. This script is executed as the user defined at image 
build (or in Kubernetes deployment settings) and it has either to be root or have 
permission to start the other services (check entrypoint example at the end)
- User definitions: if you define accounts locally, it can be cumbersome to 
have to rebuild images on every modification. A hosted ID provider like a LDAP 
helps; it adds another process to start in the entrypoint (the SSSD part in the 
example below)
- Kubernetes networking: pods are in their own IP space controlled by the K8s 
scheduler and the common way of exposing resources is through ingress 
controllers which work in conjunction with a load balancer bound to a static 
(public) IP. This obviously impacts how you define your SlumctldHost and 
AccoutingStorageHost entries and it also adds inevitable communication hops. 
Inside a K8s cluster, services may reach each other through the internal DNS 
provided out of the box, we further deployed a cluster-wide DNS to avoid any IP 
based configurations
- Persistance: containers run as ephemeral environments - you have hence to use 
volumes for the SlurmCtld state, the database's data directory if you run it 
yourself and, depending on your setup, the logs. In K8s this translates to the 
fact that the deployments have to have PersitantVolumeClaims (the exact form 
depends on your infra provider)
- Logging: if you wish to stream the logs, you need to consider a sidecar setup 
unless you accept the log streaming client alongside the server component (at 
which point you're definitely banned from calling it a microservice ;) ). We 
used filebeats from the Elastic stack as sidecars to stream the logs to a ELK 
backend. A service mesh (like linkerd) may also be an option which provides 
this out of the box, but it's likely overkill
- Admin: you don't SSH into K8s pods, at least not as you do usually. You'll 
have to get familiar with the kubectl commands (which are not that crazy) if 
you want or need to look into stuff on the instance itself

Since horizontal scaling is not applicable, the benefits are limited to those given 
by a robust clustered setup (number of worker nodes >= 3 => automatic 
reallocation in case of hardware failure, etc) -  it allows hence to avoid to have to 
run a backup controller/dbd instances but it's not a panacea and running a K8s 
cluster doesn't come for free, so to speak. At the end we found it easier to stay on 
VMs and invest our efforts into good monitoring and incident response systems (which 
themselves are on the other hand excellent candidates for a K8s deployment).

Hope that helps and don't hesitate to drop me an email if you have further 
questions,
Tilman

----- Example of entrypoint script for a Slurmctld container -----
#! /bin/bash

##### Configurations
configurations_mount_path="/mnt/configurations"
secrets_mount_path="/mnt/secrets/"

##### Script
### Validations
[ ! -d "$configurations_mount_path" ] && echo "Configurations mount 
'$configurations_mount_path' not found. Aborting..." && exit 1
[ ! -d "$secrets_mount_path" ] && echo "Secrets mount '$secrets_mount_path' not found. 
Aborting..." && exit 1

### Munge
cp /mnt/secrets/munge.key /etc/munge/munge.key
chown munge: /etc/munge/munge.key
su -s /bin/bash -c munged munge

### SSSD
# run as root - configurations copied from mount
cp /mnt/configurations/sssd.conf /etc/sssd/sssd.conf
cp /mnt/configurations/ldap-certificate.pem /etc/sssd/ldap-certificate.pem
sssd

### Slurmctld
mkdir -p /etc/slurm /var/spool/slurm /var/run/slurm
# slurm.conf from mount, configure environment
cp /mnt/configurations/slurm.conf /etc/slurm/slurm.conf
export SLURM_CONF=/etc/slurm/slurm.conf
# optional ones
[ -f /mnt/configurations/cgroups.conf ] && cp /mnt/configurations/cgroups.conf 
/etc/slurm/cgroups.conf
[ -f /mnt/configurations/gres.conf ] && cp /mnt/configurations/gres.conf 
/etc/slurm/gres.conf
[ -f /mnt/configurations/plugstack.conf ] && cp 
/mnt/configurations/plugstack.conf /etc/slurm/plugstack.conf
# pass fs ownership to slurm user (can only be done at runtime because it's 
only defined once sssd is running - LDAP behind)
chown -R slurm: /etc/slurm /var/spool/slurm /var/run/slurm

slurmctld -D

Reply via email to