I wonder whether there might be  core-pinning/NUMA toplogy/hyperthreading sort 
of thing going on here?
If the code run outside SLURM runs faster, on the same hardware, than when run 
under SLURM, it might be because some of the cores SLURM has confined the 
cgroup to are hyperthreads on a single physical core.  Or perhaps they’re not 
allocated to the physical sockets in an optimal way… that sort of thing?

Tim

--
Tim Cutts
Senior Director, R&D IT - Data, Analytics & AI, Scientific Computing Platform
AstraZeneca

Find out more about R&D IT Data, Analytics & AI and how we can support you by 
visiting our Service 
Catalogue<https://azcollaboration.sharepoint.com/sites/CMU993> |


From: Michael DiDomenico via slurm-users <slurm-users@lists.schedmd.com>
Date: Wednesday, 23 April 2025 at 7:53 pm
To:
Cc: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: [slurm-users] Re: Job running slower when using Slurm
the program probably says 32 threads, because it's just looking at the
box, not what slurm cgroups allow (assuming your using them) for cpu

i think for an openmp program (not openmpi) you definitely want the
first command with --cpus-per-task=32

are you measuring the runtime inside the program or outside it?  if
the later the 10sec addition in time could be the slurm setup/node
allocation

On Wed, Apr 23, 2025 at 2:41 PM Jeffrey Layton <layto...@gmail.com> wrote:
>
> I tried using ntasks and cpus-per-task to get all 32 cores. So I added 
> --ntasks=# --cpus-per-task=N  to th sbatch command  so that it now looks like:
>
> sbatch --nodes=1 --ntasks=1 --cpus-per-task=32 <script>
>
> It now takes 28 seconds (I ran it a few times).
>
> If I change the command to
>
> sbatch --nodes=1 --ntasks=32 --cpus-per-task=1 <script>
>
> It now takes about 30 seconds.
>
> Outside of Slurm it was only taking about 19.6 seconds. So either way it 
> takes longer.
>
> Interesting, in the output from bt, it gives the Total Threads and Avail 
> Threads. In all cases the answer is 32. If the code was only using 1 thread 
> I'm wondering why it would say Avail Threads is 32.
>
> I'm still not sure why it takes longer when Slurm is being used, but I'm 
> reading as much as I can.
>
> Thanks!
>
> Jeff
>
>
> On Wed, Apr 23, 2025 at 2:15 PM Jeffrey Layton <layto...@gmail.com> wrote:
>>
>> Roger. I didn't configure Slurm so let me look at slurm.conf and gres.conf 
>> to see if they restrict a job to a single CPU.
>>
>> Thanks
>>
>> On Wed, Apr 23, 2025 at 1:48 PM Michael DiDomenico via slurm-users 
>> <slurm-users@lists.schedmd.com> wrote:
>>>
>>> without knowing anything about your environment, its reasonable to
>>> suspect that maybe your openmp program is multi-threaded, but slurm is
>>> constraining your job to a single core.  evidence of this should show
>>> up when running top on the node, watching the cpu% used for the
>>> program
>>>
>>> On Wed, Apr 23, 2025 at 1:28 PM Jeffrey Layton via slurm-users
>>> <slurm-users@lists.schedmd.com> wrote:
>>> >
>>> > Good morning,
>>> >
>>> > I'm running an NPB test, bt.C that is OpenMP and built using NV HPC SDK 
>>> > (version 25.1). I run it on a compute node by ssh-ing to the node. It 
>>> > runs in about 19.6 seconds.
>>> >
>>> > Then I run the code using a simple job:
>>> >
>>> > Command to submit job: sbatch --nodes=1 run-npb-omp
>>> >
>>> > The script run-npb-omp is the following:
>>> >
>>> > #!/bin/bash
>>> >
>>> > cd /home/.../NPB3.4-OMP/bin
>>> >
>>> > ./bt.C.x
>>> >
>>> >
>>> > When I use Slurm, the job takes 482 seconds.
>>> >
>>> > Nothing really appears in the logs. It doesn't do any IO. No data is 
>>> > copied anywhere. I'm king of at a loss to figure out why. Any suggestions 
>>> > of where to look?
>>> >
>>> > Thanks!
>>> >
>>> > Jeff
>>> >
>>> >
>>> >
>>> > --
>>> > slurm-users mailing list -- slurm-users@lists.schedmd.com
>>> > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>>>
>>> --
>>> slurm-users mailing list -- slurm-users@lists.schedmd.com
>>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
________________________________

AstraZeneca UK Limited is a company incorporated in England and Wales with 
registered number:03674842 and its registered office at 1 Francis Crick Avenue, 
Cambridge Biomedical Campus, Cambridge, CB2 0AA.

This e-mail and its attachments are intended for the above named recipient only 
and may contain confidential and privileged information. If they have come to 
you in error, you must not copy or show them to anyone; instead, please reply 
to this e-mail, highlighting the error to the sender and then immediately 
delete the message. For information about how AstraZeneca UK Limited and its 
affiliates may process information, personal data and monitor communications, 
please see our privacy notice at 
www.astrazeneca.com<https://www.astrazeneca.com>
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to