[slurm-users] Question about IB and Ethernet networks

2024-02-25 Thread Dan Healy via slurm-users
Hi Fellow Slurm Users, This question is not slurm-specific, but it might develop into that. My question relates to understanding how *typical* HPCs are designed in terms of networking. To start, is it typical for there to be a high speed Ethernet *and* Infiniband networks (meaning separate switch

[slurm-users] Re: Question about IB and Ethernet networks

2024-02-26 Thread Dan Healy via slurm-users
connect, even if at the scale of most on-prem work, > you might be hard-pressed in real-world conditions to notice much of a > difference. If you're running jobs that take weeks and hundreds of nodes, > the time (and other) savings may add up, but if we're talking the >

[slurm-users] Re: [ext] Re: canonical way to run longer shell/bash interactive job (instead of srun inside of screen/tmux at front-end)?

2024-02-28 Thread Dan Healy via slurm-users
Are most of us using HAProxy or something else? On Wed, Feb 28, 2024 at 3:38 PM Brian Andrus via slurm-users < slurm-users@lists.schedmd.com> wrote: > Magnus, > > That is a feature of the load balancer. Most of them have that these days. > > Brian Andrus > > On 2/28/2024 12:10 AM, Hagdorn, Magnus

[slurm-users] Convergence of Kube and Slurm?

2024-05-04 Thread Dan Healy via slurm-users
Bright Cluster Manager has some verbiage on their marketing site that they can manage a cluster running both Kubernetes and Slurm. Maybe I misunderstood it. But nevertheless, I am encountering groups more frequently that want to run a stack of containers that need private container networking. Wha

[slurm-users] Executing srun -n X where X is greater than total CPU in entire cluster

2024-05-16 Thread Dan Healy via slurm-users
Hi there, SLURM community, I swear I've done this before, but now it's failing on a new cluster I'm deploying. We have 6 compute nodes with 64 cpu each (384 CPU total). When I run `srun -n 500 hostname`, the task gets queued since there's not 500 available CPU. Wasn't there an option that allows

[slurm-users] Re: Executing srun -n X where X is greater than total CPU in entire cluster

2024-05-30 Thread Dan Healy via slurm-users
Following up on this in case anyone can provide some insight, please. On Thu, May 16, 2024 at 8:32 AM Dan Healy wrote: > Hi there, SLURM community, > > I swear I've done this before, but now it's failing on a new cluster I'm > deploying. We have 6 compute nodes with 64 cpu each (384 CPU total).

[slurm-users] Can SLURM queue different jobs to start concurrently?

2024-07-08 Thread Dan Healy via slurm-users
Hi there, I've received a question from an end user, which I presume the answer is "No", but would like to ask the community first. Scenario: The user wants to create a series of jobs that all need to start at the same time. Example: there are 10 different executable applications which have varyi

[slurm-users] Re: getting slurm going

2024-12-08 Thread Dan Healy via slurm-users
sinfo srun hostname Thanks, Daniel Healy On Sun, Dec 8, 2024 at 2:30 PM Steven Jones via slurm-users < slurm-users@lists.schedmd.com> wrote: > What tests can I do to prove that slurm is talking to the nodes pls? > > > > > regards > > Steven > > -- > slurm-users mailing list -- slurm-users@list

[slurm-users] Unexpected node got allocation

2025-01-09 Thread Dan Healy via slurm-users
Hello there and good morning from Baltimore. I have a small cluster with 100 nodes. When the cluster is completely empty of all jobs, the first job gets allocated to node 41. In other clusters, the first job gets allocated to mode 01. If I specify node 01, the allocation works perfectly. I have my

[slurm-users] Re: Unexpected node got allocation

2025-01-09 Thread Dan Healy via slurm-users
No, sadly there’s no topology.conf in use. Thanks, Daniel Healy On Thu, Jan 9, 2025 at 8:28 AM Steffen Grunewald < steffen.grunew...@aei.mpg.de> wrote: > On Thu, 2025-01-09 at 07:51:40 -0500, Slurm users wrote: > > Hello there and good morning from Baltimore. > > > > I have a small cluster wit

[slurm-users] Priority/Top seems to not be working

2025-03-24 Thread Dan Healy via slurm-users
Hi Slurm Users, I have a newer install (23.11.3) and the priority of most all jobs is 1 and other small numbers. In previous versions, I would see numbers like 2^32. I have the multifactor plugin configured and confirmed it's in-use when I show the config. When I run `scontrol top` for a given jo

[slurm-users] slurmrestd equivalent to "srun -n 10 echo HELLO"

2025-03-24 Thread Dan Healy via slurm-users
Hi Slurm Community, I'm starting to experiment with slurmrestd for a new app we're writing. I'm having trouble understanding one aspect of submitting jobs. When I run something like `srun -n 10 echo HELLO', I get HELLO returned to my console/stdout 10x. When I submit this command as a script to t