[slurm-users] Re: Job running slower when using Slurm

2025-04-24 Thread Jeffrey Layton via slurm-users
nt the > first command with --cpus-per-task=32 > > are you measuring the runtime inside the program or outside it? if > the later the 10sec addition in time could be the slurm setup/node > allocation > > On Wed, Apr 23, 2025 at 2:41 PM Jeffrey Layton wrote: > > > &g

[slurm-users] Re: Job running slower when using Slurm

2025-04-24 Thread Jeffrey Layton via slurm-users
if you use any Python libs. > Best, > > Feng > > > On Wed, Apr 23, 2025 at 3:22 PM Jeffrey Layton via slurm-users < > slurm-users@lists.schedmd.com> wrote: > >> Roger. It's the code that prints out the threads it sees - I bet it is >> the cgroups. I ne

[slurm-users] Re: Job running slower when using Slurm

2025-04-23 Thread Jeffrey Layton via slurm-users
time inside the program or outside it? if > the later the 10sec addition in time could be the slurm setup/node > allocation > > On Wed, Apr 23, 2025 at 2:41 PM Jeffrey Layton wrote: > > > > I tried using ntasks and cpus-per-task to get all 32 cores. So I added > --ntasks=#

[slurm-users] Re: Job running slower when using Slurm

2025-04-23 Thread Jeffrey Layton via slurm-users
I tried using ntasks and cpus-per-task to get all 32 cores. So I added --ntasks=# --cpus-per-task=N to th sbatch command so that it now looks like: sbatch --nodes=1 --ntasks=1 --cpus-per-task=32

[slurm-users] Re: Job running slower when using Slurm

2025-04-23 Thread Jeffrey Layton via slurm-users
vironment, its reasonable to > suspect that maybe your openmp program is multi-threaded, but slurm is > constraining your job to a single core. evidence of this should show > up when running top on the node, watching the cpu% used for the > program > > On Wed, Apr 23, 2025 at 1:28 P

[slurm-users] Job running slower when using Slurm

2025-04-23 Thread Jeffrey Layton via slurm-users
Good morning, I'm running an NPB test, bt.C that is OpenMP and built using NV HPC SDK (version 25.1). I run it on a compute node by ssh-ing to the node. It runs in about 19.6 seconds. Then I run the code using a simple job: Command to submit job: sbatch --nodes=1 run-npb-omp The script run-npb-

[slurm-users] Re: Annoying canonical question about converting SLURM_JOB_NODELIST to a host list for mpirun

2024-08-12 Thread Jeffrey Layton via slurm-users
up properly bound to the specific > cores they are supposed to be allocated. So definitely proceed with caution > and validate your ranks are being laid out properly, as you will be relying > on mpirun/mpiexec to bootstrap rather than the scheduler. > > -Paul Edmon- > On 8/12/2

[slurm-users] Re: Annoying canonical question about converting SLURM_JOB_NODELIST to a host list for mpirun

2024-08-12 Thread Jeffrey Layton via slurm-users
m/scontrol.html#OPT_hostnames). That will give > you the list of hosts your job is set to run on. > > -Paul Edmon- > On 8/12/2024 8:34 AM, Jeffrey Layton via slurm-users wrote: > > Thanks! I admit I'm not that experienced in Bash. I will give this a whirl > as a test. > >

[slurm-users] Re: Annoying canonical question about converting SLURM_JOB_NODELIST to a host list for mpirun

2024-08-12 Thread Jeffrey Layton via slurm-users
Thanks! I admit I'm not that experienced in Bash. I will give this a whirl as a test. In the meantime, let ask, what is the "canonical" way to create the host list? It would be nice to have this in the Slurm FAQ somewhere. Thanks! Jeff On Fri, Aug 9, 2024 at 1:32 PM Hermann Schwärzler via slu

[slurm-users] Annoying canonical question about converting SLURM_JOB_NODELIST to a host list for mpirun

2024-08-09 Thread Jeffrey Layton via slurm-users
Good afternoon, I know this question has been asked a million times, but what is the canonical way to convert the list of nodes for a job that is container in a Slurm variable, I use SLURM_JOB_NODELIST, to a host list appropriate for mpirun in OpenMPI (perhaps MPICH as well)? Before anyone says,

[slurm-users] Re: With slurm, how to allocate a whole node for a single multi-threaded process?

2024-08-02 Thread Jeffrey Layton via slurm-users
I think all of the replies point to --exclusive being your best solution (only solution?). You need to know exactly the maximum number of cores a particular application or applications will use. Then you allow other applications to use the unused cores. Otherwise, at some point when the applicatio

[slurm-users] Re: Location of Slurm source packages?

2024-05-15 Thread Jeffrey Layton via slurm-users
; Hi Jeff! > > On 5/15/24 10:35 am, Jeffrey Layton via slurm-users wrote: > > > I have an Ubuntu 22.04 server where I installed Slurm from the Ubuntu > > packages. I now want to install pyxis but it says I need the Slurm > > sources. In Ubuntu 22.04, is there a package

[slurm-users] Re: Location of Slurm source packages?

2024-05-15 Thread Jeffrey Layton via slurm-users
> Lloyd > > -- > Lloyd Brown > HPC Systems Administrator > Office of Research Computing > Brigham Young Universityhttp://rc.byu.edu > > On 5/15/24 08:35, Jeffrey Layton via slurm-users wrote: > > Good morning, > > I have an Ubuntu 22.04 server where I installed

[slurm-users] Location of Slurm source packages?

2024-05-15 Thread Jeffrey Layton via slurm-users
Good morning, I have an Ubuntu 22.04 server where I installed Slurm from the Ubuntu packages. I now want to install pyxis but it says I need the Slurm sources. In Ubuntu 22.04, is there a package that has the source code? How to download the sources I need from github? Thanks! Jeff -- slurm-us

[slurm-users] Re: Integrating Slurm with WekaIO

2024-04-19 Thread Jeffrey Layton via slurm-users
> about the config. > > Simple solution: put a copy of slurm.conf in /etc/slurm/ on the node(s). > > Brian Andrus > On 4/19/2024 9:56 AM, Jeffrey Layton via slurm-users wrote: > > Good afternoon, > > I'm working on a cluster of NVIDIA DGX A100's that is using B

[slurm-users] Integrating Slurm with WekaIO

2024-04-19 Thread Jeffrey Layton via slurm-users
Good afternoon, I'm working on a cluster of NVIDIA DGX A100's that is using BCM 10 (Base Command Manager which is based on Bright Cluster Manager). I ran into an error and only just learned that Slurm and Weka don't get along (presumably because Weka pins their client threads to cores). I read thr

Re: [slurm-users] Help debugging Slurm configuration

2022-12-08 Thread Jeffrey Layton
urmdLogFile=/var/log/slurm/slurmd.log >> >> and then running 'scontrol reconfigure' >> >> Kind Regards, >> Glen >> >> == >> Glen MacLachlan, PhD >> *Lead High Performance Computing Engineer * >

Re: [slurm-users] Help debugging Slurm configuration

2022-12-08 Thread Jeffrey Layton
, PhD > *Lead High Performance Computing Engineer * > > Research Technology Services > The George Washington University > 44983 Knoll Square > Enterprise Hall, 328L > Ashburn, VA 20147 > > == > > > > > > > > On Thu, De

[slurm-users] Help debugging Slurm configuration

2022-12-08 Thread Jeffrey Layton
Good afternoon, I have a very simple two node cluster using Warewulf 4.3. I was following some instructions on how to install the OpenHPC Slurm binaries (server and client). I booted the compute node and the Slurm Server says it's in an unknown state. This hasn't happened to me before but I would