Hmm.. Good idea. I'll start looking at that. Thanks!
Jeff On Thu, Apr 24, 2025 at 11:02 AM Cutts, Tim via slurm-users < slurm-users@lists.schedmd.com> wrote: > I wonder whether there might be core-pinning/NUMA toplogy/hyperthreading > sort of thing going on here? > > If the code run outside SLURM runs faster, on the same hardware, than when > run under SLURM, it might be because some of the cores SLURM has confined > the cgroup to are hyperthreads on a single physical core. Or perhaps > they’re not allocated to the physical sockets in an optimal way… that sort > of thing? > > > > Tim > > > > -- > > *Tim Cutts* > > Senior Director, R&D IT - Data, Analytics & AI, Scientific Computing > Platform > > AstraZeneca > > > > Find out more about R&D IT Data, Analytics & AI and how we can support you > by visiting our Service Catalogue > <https://azcollaboration.sharepoint.com/sites/CMU993> | > > > > > > *From: *Michael DiDomenico via slurm-users <slurm-users@lists.schedmd.com> > *Date: *Wednesday, 23 April 2025 at 7:53 pm > *To: * > *Cc: *Slurm User Community List <slurm-users@lists.schedmd.com> > *Subject: *[slurm-users] Re: Job running slower when using Slurm > > the program probably says 32 threads, because it's just looking at the > box, not what slurm cgroups allow (assuming your using them) for cpu > > i think for an openmp program (not openmpi) you definitely want the > first command with --cpus-per-task=32 > > are you measuring the runtime inside the program or outside it? if > the later the 10sec addition in time could be the slurm setup/node > allocation > > On Wed, Apr 23, 2025 at 2:41 PM Jeffrey Layton <layto...@gmail.com> wrote: > > > > I tried using ntasks and cpus-per-task to get all 32 cores. So I added > --ntasks=# --cpus-per-task=N to th sbatch command so that it now looks > like: > > > > sbatch --nodes=1 --ntasks=1 --cpus-per-task=32 <script> > > > > It now takes 28 seconds (I ran it a few times). > > > > If I change the command to > > > > sbatch --nodes=1 --ntasks=32 --cpus-per-task=1 <script> > > > > It now takes about 30 seconds. > > > > Outside of Slurm it was only taking about 19.6 seconds. So either way it > takes longer. > > > > Interesting, in the output from bt, it gives the Total Threads and Avail > Threads. In all cases the answer is 32. If the code was only using 1 thread > I'm wondering why it would say Avail Threads is 32. > > > > I'm still not sure why it takes longer when Slurm is being used, but I'm > reading as much as I can. > > > > Thanks! > > > > Jeff > > > > > > On Wed, Apr 23, 2025 at 2:15 PM Jeffrey Layton <layto...@gmail.com> > wrote: > >> > >> Roger. I didn't configure Slurm so let me look at slurm.conf and > gres.conf to see if they restrict a job to a single CPU. > >> > >> Thanks > >> > >> On Wed, Apr 23, 2025 at 1:48 PM Michael DiDomenico via slurm-users < > slurm-users@lists.schedmd.com> wrote: > >>> > >>> without knowing anything about your environment, its reasonable to > >>> suspect that maybe your openmp program is multi-threaded, but slurm is > >>> constraining your job to a single core. evidence of this should show > >>> up when running top on the node, watching the cpu% used for the > >>> program > >>> > >>> On Wed, Apr 23, 2025 at 1:28 PM Jeffrey Layton via slurm-users > >>> <slurm-users@lists.schedmd.com> wrote: > >>> > > >>> > Good morning, > >>> > > >>> > I'm running an NPB test, bt.C that is OpenMP and built using NV HPC > SDK (version 25.1). I run it on a compute node by ssh-ing to the node. It > runs in about 19.6 seconds. > >>> > > >>> > Then I run the code using a simple job: > >>> > > >>> > Command to submit job: sbatch --nodes=1 run-npb-omp > >>> > > >>> > The script run-npb-omp is the following: > >>> > > >>> > #!/bin/bash > >>> > > >>> > cd /home/.../NPB3.4-OMP/bin > >>> > > >>> > ./bt.C.x > >>> > > >>> > > >>> > When I use Slurm, the job takes 482 seconds. > >>> > > >>> > Nothing really appears in the logs. It doesn't do any IO. No data is > copied anywhere. I'm king of at a loss to figure out why. Any suggestions > of where to look? > >>> > > >>> > Thanks! > >>> > > >>> > Jeff > >>> > > >>> > > >>> > > >>> > -- > >>> > slurm-users mailing list -- slurm-users@lists.schedmd.com > >>> > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com > >>> > >>> -- > >>> slurm-users mailing list -- slurm-users@lists.schedmd.com > >>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com > ------------------------------ > > AstraZeneca UK Limited is a company incorporated in England and Wales with > registered number:03674842 and its registered office at 1 Francis Crick > Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. > > This e-mail and its attachments are intended for the above named recipient > only and may contain confidential and privileged information. If they have > come to you in error, you must not copy or show them to anyone; instead, > please reply to this e-mail, highlighting the error to the sender and then > immediately delete the message. For information about how AstraZeneca UK > Limited and its affiliates may process information, personal data and > monitor communications, please see our privacy notice at > www.astrazeneca.com > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com >
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com