Hmm.. Good idea. I'll start looking at that.

Thanks!

Jeff


On Thu, Apr 24, 2025 at 11:02 AM Cutts, Tim via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> I wonder whether there might be  core-pinning/NUMA toplogy/hyperthreading
> sort of thing going on here?
>
> If the code run outside SLURM runs faster, on the same hardware, than when
> run under SLURM, it might be because some of the cores SLURM has confined
> the cgroup to are hyperthreads on a single physical core.  Or perhaps
> they’re not allocated to the physical sockets in an optimal way… that sort
> of thing?
>
>
>
> Tim
>
>
>
> --
>
> *Tim Cutts*
>
> Senior Director, R&D IT - Data, Analytics & AI, Scientific Computing
> Platform
>
> AstraZeneca
>
>
>
> Find out more about R&D IT Data, Analytics & AI and how we can support you
> by visiting our Service Catalogue
> <https://azcollaboration.sharepoint.com/sites/CMU993> |
>
>
>
>
>
> *From: *Michael DiDomenico via slurm-users <slurm-users@lists.schedmd.com>
> *Date: *Wednesday, 23 April 2025 at 7:53 pm
> *To: *
> *Cc: *Slurm User Community List <slurm-users@lists.schedmd.com>
> *Subject: *[slurm-users] Re: Job running slower when using Slurm
>
> the program probably says 32 threads, because it's just looking at the
> box, not what slurm cgroups allow (assuming your using them) for cpu
>
> i think for an openmp program (not openmpi) you definitely want the
> first command with --cpus-per-task=32
>
> are you measuring the runtime inside the program or outside it?  if
> the later the 10sec addition in time could be the slurm setup/node
> allocation
>
> On Wed, Apr 23, 2025 at 2:41 PM Jeffrey Layton <layto...@gmail.com> wrote:
> >
> > I tried using ntasks and cpus-per-task to get all 32 cores. So I added
> --ntasks=# --cpus-per-task=N  to th sbatch command  so that it now looks
> like:
> >
> > sbatch --nodes=1 --ntasks=1 --cpus-per-task=32 <script>
> >
> > It now takes 28 seconds (I ran it a few times).
> >
> > If I change the command to
> >
> > sbatch --nodes=1 --ntasks=32 --cpus-per-task=1 <script>
> >
> > It now takes about 30 seconds.
> >
> > Outside of Slurm it was only taking about 19.6 seconds. So either way it
> takes longer.
> >
> > Interesting, in the output from bt, it gives the Total Threads and Avail
> Threads. In all cases the answer is 32. If the code was only using 1 thread
> I'm wondering why it would say Avail Threads is 32.
> >
> > I'm still not sure why it takes longer when Slurm is being used, but I'm
> reading as much as I can.
> >
> > Thanks!
> >
> > Jeff
> >
> >
> > On Wed, Apr 23, 2025 at 2:15 PM Jeffrey Layton <layto...@gmail.com>
> wrote:
> >>
> >> Roger. I didn't configure Slurm so let me look at slurm.conf and
> gres.conf to see if they restrict a job to a single CPU.
> >>
> >> Thanks
> >>
> >> On Wed, Apr 23, 2025 at 1:48 PM Michael DiDomenico via slurm-users <
> slurm-users@lists.schedmd.com> wrote:
> >>>
> >>> without knowing anything about your environment, its reasonable to
> >>> suspect that maybe your openmp program is multi-threaded, but slurm is
> >>> constraining your job to a single core.  evidence of this should show
> >>> up when running top on the node, watching the cpu% used for the
> >>> program
> >>>
> >>> On Wed, Apr 23, 2025 at 1:28 PM Jeffrey Layton via slurm-users
> >>> <slurm-users@lists.schedmd.com> wrote:
> >>> >
> >>> > Good morning,
> >>> >
> >>> > I'm running an NPB test, bt.C that is OpenMP and built using NV HPC
> SDK (version 25.1). I run it on a compute node by ssh-ing to the node. It
> runs in about 19.6 seconds.
> >>> >
> >>> > Then I run the code using a simple job:
> >>> >
> >>> > Command to submit job: sbatch --nodes=1 run-npb-omp
> >>> >
> >>> > The script run-npb-omp is the following:
> >>> >
> >>> > #!/bin/bash
> >>> >
> >>> > cd /home/.../NPB3.4-OMP/bin
> >>> >
> >>> > ./bt.C.x
> >>> >
> >>> >
> >>> > When I use Slurm, the job takes 482 seconds.
> >>> >
> >>> > Nothing really appears in the logs. It doesn't do any IO. No data is
> copied anywhere. I'm king of at a loss to figure out why. Any suggestions
> of where to look?
> >>> >
> >>> > Thanks!
> >>> >
> >>> > Jeff
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > slurm-users mailing list -- slurm-users@lists.schedmd.com
> >>> > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
> >>>
> >>> --
> >>> slurm-users mailing list -- slurm-users@lists.schedmd.com
> >>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
> ------------------------------
>
> AstraZeneca UK Limited is a company incorporated in England and Wales with
> registered number:03674842 and its registered office at 1 Francis Crick
> Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.
>
> This e-mail and its attachments are intended for the above named recipient
> only and may contain confidential and privileged information. If they have
> come to you in error, you must not copy or show them to anyone; instead,
> please reply to this e-mail, highlighting the error to the sender and then
> immediately delete the message. For information about how AstraZeneca UK
> Limited and its affiliates may process information, personal data and
> monitor communications, please see our privacy notice at
> www.astrazeneca.com
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to