FYI, after more internet sleuthing (searching for “juju slurm”) I came across 
this outstanding looking project: Omnivector Slurm Distribution (OSD): 
https://omnivector-solutions.github.io/osd-documentation/master/index.html

This project uses Juju (Canonical project) to deploy, configure and manage a 
Slurm cluster along with a variety of other components, like SlurmREST API, 
Prometheus integration , log forwarding via Fluentbit to Graylog and others

Deployment targets include cloud AWS/Openstack, local LXD, MAAS for baremetal…

I’ve only started to play with OSD, but it looks like a great framework for 
deploying Slurm clusters.

Quick install on an Ubuntu 22.04LTS host:

sudo snap install juju --classic
sudo snap install lxd
lxd init --auto
lxc network set lxdbr0 ipv6.address none
sudo ufw allow 8443/tcp
juju bootstrap --show-log localhost

Followed by a quick test of sinfo:

juju run --unit slurmctld/0 "sinfo"

PARTITION  AVAIL  TIMELIMIT  NODES  STATE NODELIST
osd-slurmd    up   infinite      1  down* juju-65df3d-2

juju run --unit slurmctld/0 "sinfo -R"

REASON               USER      TIMESTAMP           NODELIST
New node             slurm     2023-03-15T01:21:21 juju-65df3d-2

Mike
From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Hanby, 
Mike <mha...@uab.edu>
Date: Wednesday, February 15, 2023 at 1:51 PM
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
Subject: [slurm-users] Running Containerized Slurmctld and Slurmdb in 
Production?
Howdy,

Just wondering if any sites are running containerized Slurmctld and Slurmdbd in 
production?

We are in the process of planning migrating from a single host running 
slurmctld, slurmdbd, and MySQL (and other HPC services) to separate OpenStack 
VMs. Our site averages less than 1000’s running / pending jobs at any given 
time. Like many HPC sites, our jobs are a mix of long running, large arrays, 
very short…

I ran across this Github project “Slurm Docker Cluster” 
https://github.com/giovtorres/slurm-docker-cluster<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fgiovtorres%2Fslurm-docker-cluster&data=05%7C01%7Cmhanby%40uab.edu%7C6dd0fbb8a506499d329308db0f85b1f9%7Cd8999fe476af40b3b4351d8977abc08c%7C1%7C0%7C638120839125275887%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Wt%2Fc%2BdpX5xMFtTn47aZOPF%2BELV7H0mb%2Fb4Eib9atgaI%3D&reserved=0>
 and got me thinking that this method might be great for simpler upgrades, ease 
of reproducing the cluster in development, etc…

How about it, anyone running containerized Slurm server processes in production?

Thanks, Mike

Reply via email to