[slurm-users] Re: Memory used per node

2024-02-09 Thread Davide DelVento via slurm-users
If you would like the high watermark memory utilization after the job
completes, https://github.com/NCAR/peak_memusage is a great tool. Of course
it has the limitation that you need to know that you want that information
*before* starting the job, which might or might not a problem for your use
case

On Fri, Feb 9, 2024 at 10:07 AM Gerhard Strangar via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hello,
>
> I'm wondering if there's a way to tell how much memory my job is using
> per node. I'm doing
>
> #SBATCH -n 256
> srun solver inputfile
>
> When I run sacct -o maxvmsize, the result apparently is the maxmimum VSZ
> of the largest solver process, not the maximum of the sum of them all
> (unlike when calling mpirun instead). When I sstat -o TresUsageInMax, I
> get the memory summed up over all nodes being used. Can I get the
> maximum VSZ per node?
>
>
> Gerhard
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Compilation question

2024-02-09 Thread Davide DelVento via slurm-users
Hi Sylvain,
For the series better late than never, is this still a problem?
If so, is this a new install or an update?
Whan environment/compiler are you using? The error

undefined reference to `__nv_init_env'

seems to indicate that you are doing something cuda-related which I think
you should not be doing?

In any case, most people run on a RHEL (or compatible) distro and use
rpmbuild rather than straight configure/make, e.g. a variant of what is
described at https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/

Hope this helps,


On Wed, Jan 17, 2024 at 8:36 AM Sylvain MARET 
wrote:

> Hello everyone !
>
> I'm trying to compile slurm 22.05.11 on Rocky linux 8.7 with freeipmi
> support
>
> I've seen the documentation so I've done the configure step :
>
> ./configure --with-pmix=$PMIXHOME --with-ucx=$UCXHOME
> --with-nvml=$NVMLHOME --prefix=$SLURMHOME --with-freeipmi=/usr
>
> but when I run make I end up with the following error :
>
> /bin/sh ../../../../../libtool  --tag=CC   --mode=link gcc
> -DNUMA_VERSION1_COMPATIBILITY -g -O2 -fno-omit-frame-pointer -pthread
> -ggdb3 -Wall -g -O1 -fno-strict-aliasing -export-dynamic -L/usr/lib64
> -lhdf5_hl -lhdf5  -lsz -lz -ldl -lm  -o sh5util sh5util.o
> -Wl,-rpath=/softs/batch/slurm/22.05.11/lib/slurm
> -L../../../../../src/api/.libs -lslurmfull -ldl ../libhdf5_api.la
> -lpthread -lm -lresolv
> libtool: link: gcc -DNUMA_VERSION1_COMPATIBILITY -g -O2
> -fno-omit-frame-pointer -pthread -ggdb3 -Wall -g -O1
> -fno-strict-aliasing -o .libs/sh5util sh5util.o
> -Wl,-rpath=/softs/batch/slurm/22.05.11/lib/slurm -Wl,--export-dynamic
> -L/usr/lib64 -L../../../../../src/api/.libs
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so
> ../.libs/libhdf5_api.a -lhdf5_hl -lhdf5 -lsz -lz -ldl -lpthread -lm
> -lresolv -pthread -Wl,-rpath -Wl,/softs/batch/slurm/22.05.11/lib/slurm
> sh5util.o:(.init_array+0x0): undefined reference to `__nv_init_env'
> sh5util.o:(.init_array+0x8): undefined reference to `__flushz'
> sh5util.o:(.init_array+0x10): undefined reference to `__daz'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `slurm_list_transfer_unique'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `slurm_sort_key_pairs'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `slurm_xstrchr'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `slurm_unsetenvp'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `slurm_list_sort'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `slurm_list_for_each'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `__builtin__pgi_isnanld'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `slurm_get_extra_conf_path'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `__blt_pgi_ctzll'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `slurm_running_in_slurmctld'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `__c_mcopy1'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `__blt_pgi_clzll'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `slurm_list_create'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `slurm_list_count'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `__builtin_va_gparg1'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `slurm_destroy_config_key_pair'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `slurm_xfree_ptr'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `slurm_getenvp'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `slurm_free_buf'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `slurm_get_log_level'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `__c_mset8'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `slurm_xstrdup_printf'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `slurm_list_delete_first'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `slurm_list_append'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `slurm_error'
> /softs/batch/slurm/slurm-22.05.11/src/api/.libs/libslurmfull.so:
> undefined reference to `slurm_init_buf'
> /softs/batch/slu

[slurm-users] Re: Need help managing licence

2024-02-16 Thread Davide DelVento via slurm-users
The simple answer is to just add a line such as
Licenses=whatever:20

and then request your users to use the -L option as described at

https://slurm.schedmd.com/licenses.html

This works very well, however it does not do enforcement like Slurm does
with other resources. You will find posts in this list from me trying to
achieve such enforcement with prolog, but I ended up banging too much my
head on the keyboard and so I eventually gave up. User education was easier
for me. Depending on your user community, banging your head on the keyboard
might be easier than educating your users -- if so please share how you
solve the issue

On Fri, Feb 16, 2024 at 7:48 AM Sylvain MARET via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hello everyone !
>
> Recently our users bought a cplex dynamic license and want to use it on
> our slurm cluster.
> I've installed the paid version of cplex within modules so authorized
> user can load it with a simple module load cplex/2111 command but I
> don't know how to manage and ensure slurm doesn't launch a job if 20
> people are already running code with this license.
>
> How do you guys manage paid licenses on your cluster ? Any advice would
> be appreciated !
>
> Regards,
> Sylvain Maret
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Recover Batch Script Error

2024-02-16 Thread Davide DelVento via slurm-users
Yes, that is what we are also doing and it works well.
Note that requesting a batch script for another user, one sees nothing
(rather than an error message saying that one does not have permissions)

On Fri, Feb 16, 2024 at 12:48 PM Paul Edmon via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Are you using the job_script storage option? If so then you should be able
> to get at it by doing:
>
> sacct -B j JOBID
>
> https://slurm.schedmd.com/sacct.html#OPT_batch-script
>
> -Paul Edmon-
> On 2/16/2024 2:41 PM, Jason Simms via slurm-users wrote:
>
> Hello all,
>
> I've used the "scontrol write batch_script" command to output the job
> submission script from completed jobs in the past, but for some reason, no
> matter which job I specify, it tells me it is invalid. Any way to
> troubleshoot this? Alternatively, is there another way - even if a manual
> database query - to recover the job script, assuming it exists in the
> database?
>
> sacct --jobs=38960
> JobID   JobName  PartitionAccount  AllocCPUS  State
> ExitCode
>  -- -- -- -- --
> 
> 38960amr_run_v+ tsmith2lab tsmith2lab 72  COMPLETED
>  0:0
> 38960.batch   batchtsmith2lab 40  COMPLETED
>  0:0
> 38960.extern externtsmith2lab 72  COMPLETED
>  0:0
> 38960.0  hydra_pmi+tsmith2lab 72  COMPLETED
>  0:0
>
> scontrol write batch_script 38960
> job script retrieval failed: Invalid job id specified
>
> Warmest regards,
> Jason
>
> --
> *Jason L. Simms, Ph.D., M.P.H.*
> Manager of Research Computing
> Swarthmore College
> Information Technology Services
> (610) 328-8102
> Schedule a meeting: https://calendly.com/jlsimms
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Partition Preemption Configuration Question

2024-05-02 Thread Davide DelVento via slurm-users
Hi Jason,

I wanted exactly the same and was confused exactly like you. For a while it
did not work, regardless of what I tried, but eventually (with some help) I
figured it out.

What I set up and it is working fine is this globally

PreemptType = preempt/partition_prio
PreemptMode=REQUEUE

and then individually each partition definition has either PreemptMode=off
or PreemptMode=cancel

It took me a while to make it work, and the problem in my case was that I
did not include the requeue line because (as I am describing) I did not
want requeue, but without that line slurm preemption simply would not work.
Since it's overridden in each partition, then it works as if it's not
there, but it must be there. Very simple once you know it.

Hope this helps

On Thu, May 2, 2024 at 9:16 AM Jason Simms via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hello all,
>
> The Slurm docs have me a bit confused... I'm wanting to enable job
> preemption on certain partitions but not others. I *presume* I would
> set PreemptType=preempt/partition_prio globally, but then on the partitions
> where I don't want jobs to be able to be preempted, I would set
> PreemptMode=off within the configuration for that specific partition.
>
> The documentation, however, says that setting PreemptMode=off at a
> partition level "is only compatible with PreemptType=preempt/none at a
> global level" yet then immediately says that doing so is a "common use case
> for this parameter is to set it on a partition to disable preemption for
> that partition," which indicates preemption would still be allowable for
> other partitions.
>
> If PreemptType is set to preempt/none globally, and I *cannot* set that as
> an option for a given partition (at least, the documentation doesn't
> indicate that is a valid parameter for a partition), wouldn't preemption be
> disabled globally anyway? The wording seems odd to me and almost
> contradictory.
>
> Is it possible to have PreemptType=preempt/partition_prio set globally,
> yet also disable it on specific partitions with PreemptMode=off? Is
> PreemptType actually a valid configuration option for specific partitions?
>
> Thanks for any guidance.
>
> Warmest regards,
> Jason
>
> --
> *Jason L. Simms, Ph.D., M.P.H.*
> Manager of Research Computing
> Swarthmore College
> Information Technology Services
> (610) 328-8102
> Schedule a meeting: https://calendly.com/jlsimms
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: StateSaveLocation and Slurm HA

2024-05-07 Thread Davide DelVento via slurm-users
Are you seeking something simple rather than sophisticated? If so, you can
use the controller local disk for StateSaveLocation and place a cron job
(on the same node or somewhere else) to take that data out via e.g. rsync
and put it where you need it (NFS?) for the backup control node to use
if/when needed. That obviously introduces a time delay which might or might
not be problematic depending on what kind of failures you are trying to
protect from and with what level of guarantee you wish the HA would have:
you will not be protected in every possible scenario. On the other hand,
given the size of the cluster that might be adequate and it's basically
zero effort, so it might be "good enough" for you.

On Tue, May 7, 2024 at 4:44 AM Pierre Abele via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hi all,
>
> I am looking for a clean way to set up Slurms native high availability
> feature. I am managing a Slurm cluster with one control node (hosting
> both slurmctld and slurmdbd), one login node and a few dozen compute
> nodes. I have a virtual machine that I want to set up as a backup
> control node.
>
> The Slurm documentation says the following about the StateSaveLocation
> directory:
>
> > The directory used should be on a low-latency local disk to prevent file
> system delays from affecting Slurm performance. If using a backup host, the
> StateSaveLocation should reside on a file system shared by the two hosts.
> We do not recommend using NFS to make the directory accessible to both
> hosts, but do recommend a shared mount that is accessible to the two
> controllers and allows low-latency reads and writes to the disk. If a
> controller comes up without access to the state information, queued and
> running jobs will be cancelled. [1]
>
> My question: How do I implement the shared file system for the
> StateSaveLocation?
>
> I do not want to introduce a single point of failure by having a single
> node that hosts the StateSaveLocation, neither do I want to put that
> directory on the clusters NFS storage since outages/downtime of the
> storage system will happen at some point and I do not want that to cause
> an outage of the Slurm controller.
>
> Any help or ideas would be appreciated.
>
> Best,
> Pierre
>
>
> [1] https://slurm.schedmd.com/quickstart_admin.html#Config
>
> --
> Pierre Abele, M.Sc.
>
> HPC Administrator
> Max-Planck-Institute for Evolutionary Anthropology
> Department of Primate Behavior and Evolution
>
> Deutscher Platz 6
> 04103 Leipzig
>
> Room: U2.80
> E-Mail: pierre_ab...@eva.mpg.de
> Phone: +49 (0) 341 3550 245
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Partition Preemption Configuration Question

2024-05-08 Thread Davide DelVento via slurm-users
{
  "emoji": "👍",
  "version": 1
}
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: memory high water mark reporting

2024-05-16 Thread Davide DelVento via slurm-users
Not exactly the answer to your question (which I don't know) but if you can
get to prefix whatever is executed with this
https://github.com/NCAR/peak_memusage (which also uses getrusage) or a
variant you will be able to do that.

On Thu, May 16, 2024 at 4:10 PM Emyr James via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hi,
>
> We are trying out slurm having been running grid engine for a long while.
> In grid engine, the cgroups peak memory and max_rss are generated at the
> end of a job and recorded. It logs the information from the cgroup
> hierarchy as well as doing a getrusage call right at the end on the parent
> pid of the whole job "container" before cleaning up.
> With slurm it seems that the only way memory is recorded is by the acct
> gather polling. I am trying to add something in an epilog script to get the
> memory.peak but It looks like the cgroup hierarchy has been destroyed by
> the time the epilog is run.
> Where in the code is the cgroup hierarchy cleared up ? Is there no way to
> add something in so that the accounting is updated during the job cleanup
> process so that peak memory usage can be accurately logged ?
>
> I can reduce the polling interval from 30s to 5s but don't know if this
> causes a lot of overhead and in any case this seems to not be a sensible
> way to get values that should just be determined right at the end by an
> event rather than using polling.
>
> Many thanks,
>
> Emyr
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Best practice for jobs resuming from suspended state

2024-05-16 Thread Davide DelVento via slurm-users
I don't really have an answer for you, just responding to make your message
pop out in the "flood" of other topics we've got since you posted.

On our cluster we configure cancelling our jobs because it makes more sense
for our situation, so I have no experience with that resume from being
suspended. I can think of two possible reasons for this:

- one is memory (have you checked your memory logs and see if there is a
correlation between node memory occupation and jobs not resuming correctly)
- the second one is some resources disappearing (temp files? maybe in some
circumstances slurm totally wipes out /tmp the second job -- if so, that
would be a slurm bug, obviously)

Assuming that you're stuck without finding a root cause which you can
address, I guess it depends on what "doesn't recover" means. It's one thing
if it crashes immediately. It's another if it just stalls without even
starting but slurm still thinks it's running and the users are charged
their allocation -- even worse if your cluster does not enforce a
wallclock limit (or has a very long one). Depending on frequency of the
issue, size of your cluster and other conditions, you may want to consider
writing a watchdog script which would search for these jobs and cancel them?

As I said, not really an answer, just my $0.02 cents (or even less)

On Wed, May 15, 2024 at 1:54 AM Paul Jones via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hi,
>
> We use PreemptMode and PriorityTier within Slurm to suspend low priority
> jobs when more urgent work needs to be done. This generally works well, but
> on occasion resumed jobs fail to restart - which is to say Slurm sets the
> job status to running but the actual code doesn't recover from being
> suspended.
>
> Technically everything is working as expected, but I wondered if there was
> any best practice to pass onto users about how to cope with this state?
> Obviously not a direct Slurm question, but wondered if others had
> experience with this and any advice on how best to limit the impact?
>
> Thanks,
> Paul
>
> --
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Can SLURM queue different jobs to start concurrently?

2024-07-08 Thread Davide DelVento via slurm-users
I think the best way to do it would be to schedule the 10 things to be a
single slurm job and then use some of the various MPMD ways (the nitty
gritty details depend if each executable is serial, OpenMP, MPI or hybrid).

On Mon, Jul 8, 2024 at 2:20 PM Dan Healy via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hi there,
>
> I've received a question from an end user, which I presume the answer is
> "No", but would like to ask the community first.
>
> Scenario: The user wants to create a series of jobs that all need to start
> at the same time. Example: there are 10 different executable applications
> which have varying CPU and RAM constraints, all of which need to
> communicate via TCP/IP. Of course the user could design some type of
> idle/statusing mechanism to wait until all jobs are *randomly *started,
> then begin execution, but this feels like a waste of resources. The
> complete execution of these 10 applications would be considered a single
> simulation. The goal would be to distribute these 10 applications across
> the cluster and not necessarily require them all to execute on a single
> node.
>
> Is there a good architecture for this using SLURM? If so, please kindly
> point me in the right direction.
>
> --
> Thanks,
>
> Daniel Healy
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: With slurm, how to allocate a whole node for a single multi-threaded process?

2024-08-01 Thread Davide DelVento via slurm-users
In part, it depends on how it's been configured, but have you tried
--exclusive?

On Thu, Aug 1, 2024 at 7:39 AM Henrique Almeida via slurm-users <
slurm-users@lists.schedmd.com> wrote:

>  Hello, everyone, with slurm, how to allocate a whole node for a
> single multi-threaded process?
>
>
> https://stackoverflow.com/questions/78818547/with-slurm-how-to-allocate-a-whole-node-for-a-single-multi-threaded-process
>
>
> --
>  Henrique Dante de Almeida
>  hda...@gmail.com
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: With slurm, how to allocate a whole node for a single multi-threaded process?

2024-08-02 Thread Davide DelVento via slurm-users
I am pretty sure with vanilla slurm is impossible.

What it might be possible (maybe) is submitting 5 core jobs and using some
pre-post scripts which immediately before the job start change the
requested number of cores to "however are currently available on the node
where it is scheduled to run". That feels like a nightmare script to write,
prone to race conditions (e.g. what is slurm has scheduled another job on
the same node to start almost at the same time?). It also may be
impractical (the modified job will probably need to be rescheduled,
possibly landing on another node with a different number of idle cores) or
impossible (maybe slurm does not offer the possibility of changing the
requested nodes after the job has been assigned a node, only at other
times, such as submission time).

What is theoretically possible would be to use slurm only as a "dummy bean
counter": submit the job as a 5 core job and let it land and start on a
node. The job itself does nothing other than counting the number of idle
nodes on that core and submitting *another* slurm job of the highest
priority targeting that specific node (option -w) and that number of cores.
If the second job starts, then by some other mechanism, probably external
to slurm, the actual computational job will start on the appropriate nodes.
If that happens outside of slurm, it would be very hard to get right (with
the appropriate cgroup for example). If that happens inside of slurm, it
needs some functionality which I am not aware exists, but it sounds more
likely than "changing the number of cores at the moment the job start". For
example the two jobs could merge into one. Or the two jobs could stay
separate, but share some MPI communicator or thread space (but again have
troubles with the separate cgroups they live in).

So in conclusion if this is just a few jobs where you are trying to be more
efficient, I think it's better to give up. If this is something of really
large scale and important, then my recommendation would be to purchase
official Slurm support and get assistance from them

On Fri, Aug 2, 2024 at 8:37 AM Laura Hild via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> My read is that Henrique wants to specify a job to require a variable
> number of CPUs on one node, so that when the job is at the front of the
> queue, it will run opportunistically on however many happen to be available
> on a single node as long as there are at least five.
>
> I don't personally know of a way to specify such a job, and wouldn't be
> surprised if there isn't one, since as other posters have suggested,
> usually there's a core-count sweet spot that should be used, achieving a
> performance goal while making efficient use of resources.  A cluster
> administrator may in fact not want you using extra cores, even if there's a
> bit more speed-up to be had, when those cores could be used more
> efficiently by another job.  I'm also not sure how one would set a
> judicious TimeLimit on a job that would have such a variable wall-time.
>
> So there is the question of whether it is possible, and whether it is
> advisable.
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Seeking Commercial SLURM Subscription Provider

2024-08-13 Thread Davide DelVento via slurm-users
How about SchedMD itself? They are the ones doing most (if not all) of the
development, and they are great.
In my experience, the best options are either SchedMD or the vendor of your
hardware.

On Mon, Aug 12, 2024 at 11:17 PM John Joseph via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Dear All,
>
> Good morning.
>
> We successfully implemented a 4-node SLURM cluster with shared storage
> using GlusterFS and were able to run COMSOL programs on it. After this
> learning experience, we've determined that it would be beneficial to switch
> to a commercial SLURM subscription for better support.
>
> We are currently seeking a solution provider who can offer support based
> on a commercial subscription. I would like to reach out to the group for
> recommendations or advice on how we can avail these services commercially.
> Thank you.
> Joseph John
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Print Slurm Stats on Login

2024-08-13 Thread Davide DelVento via slurm-users
I too would be interested in some lightweight scripts. XDMOD in my
experience has been very intense in workload to install, maintain and
learn. It's great if one needs that level of interactivity, granularity and
detail, but for some "quick and dirty" summary in a small dept it's not
only overkill, it's also impossible given the available staffing.

On Fri, Aug 9, 2024 at 10:31 AM Paul Edmon via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Yup, we have that installed already. It's been very beneficial for over
> all monitoring.
>
> -Paul Edmon-
>
> On 8/9/2024 12:27 PM, Reid, Andrew C.E. (Fed) wrote:
> >Maybe a heavier lift than you had in mind, but check
> > out xdmod, open.xdmod.org.
> >
> >It was developed by the NSF as part of the now-shuttered
> > XSEDE program, and is useful for both system and user monitoring.
> >
> >-- A.
> >
> > On Fri, Aug 09, 2024 at 12:12:08PM -0400, Paul Edmon via slurm-users
> wrote:
> >> Yeah, I was contemplating doing that so I didn't have a dependency on
> the
> >> scheduler being up or down or busy.
> >>
> >> What I was more curious about is if any one had an prebaked scripts for
> >> that.
> >>
> >> -Paul Edmon-
> >>
> >> On 8/9/2024 12:04 PM, Jeffrey T Frey wrote:
> >>> You'd have to do this within e.g. the system's bashrc infrastructure.
> The simplest idea would be to add to e.g. /etc/profile.d/zzz-slurmstats.sh
> and have some canned commands/scripts running.  That does introduce load to
> the system and Slurm on every login, though, and slows the startup of login
> shells based on how responsive slurmctld/slurmdbd are at that moment.
> >>>
> >>> Another option would be to run the commands/scripts for all users on
> some timed schedule — e.g. produce per-user stats every 30 minutes.  So
> long as the stats are publicly-visible anyway, put those summaries in a
> shared file system with open read access.  Name the files by uid number.
> Now your /etc/profile.d script just cat's ${STATS_DIR}/$(id -u).
> >>>
> >>>
> >>>
> >>>
>  On Aug 9, 2024, at 11:11, Paul Edmon via slurm-users <
> slurm-users@lists.schedmd.com> wrote:
> 
>  We are working to make our users more aware of their usage. One of
> the ideas we came up with was to having some basic usage stats printed at
> login (usage over past day, fairshare, job efficiency, etc). Does anyone
> have any scripts or methods that they use to do this? Before baking my own
> I was curious what other sites do and if they would be willing to share
> their scripts and methodology.
> 
>  -Paul Edmon-
> 
> 
>  --
>  slurm-users mailing list -- slurm-users@lists.schedmd.com
>  To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
> >> --
> >> slurm-users mailing list -- slurm-users@lists.schedmd.com
> >> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Print Slurm Stats on Login

2024-08-14 Thread Davide DelVento via slurm-users
This is wonderful, thanks Josef and Ole! I will need to familiarize myself
with it, but on a cursory glance it looks almost exactly what I was looking
for!

On Wed, Aug 14, 2024 at 1:44 AM Josef Dvořáček via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> > I too would be interested in some lightweight scripts
>
> For lightweight stats I tend to use this excellent script: slurmacct.
> Author is member of this mailinglist too. (hi):
>
>
> https://github.com/OleHolmNielsen/Slurm_tools/blob/master/slurmacct/slurmacct
>
> Currently I am in process of writing prometheus exporter as the one I've
> used for years (https://github.com/vpenso/prometheus-slurm-exporter)
> provides suboptimal results with Slurm 24.04+.
> (we use looong job arrays at our system breaking somehow the exporter,
> which is parsing text output of squeue command)
>
> cheers
>
> josef
>
> --
> *From:* Davide DelVento via slurm-users 
> *Sent:* Wednesday, 14 August 2024 01:52
> *To:* Paul Edmon 
> *Cc:* Reid, Andrew C.E. (Fed) ; Jeffrey T Frey <
> f...@udel.edu>; slurm-users@lists.schedmd.com <
> slurm-users@lists.schedmd.com>
> *Subject:* [slurm-users] Re: Print Slurm Stats on Login
>
> I too would be interested in some lightweight scripts. XDMOD in my
> experience has been very intense in workload to install, maintain and
> learn. It's great if one needs that level of interactivity, granularity and
> detail, but for some "quick and dirty" summary in a small dept it's not
> only overkill, it's also impossible given the available staffing.
> ...
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Unable to run sequential jobs simultaneously on the same node

2024-08-19 Thread Davide DelVento via slurm-users
Since each instance of the program is independent and you are using one
core for each, it'd be better to leave slurm deal with that and schedule
them concurrently as it sees fit. Maybe you simply need to add some
directive to allow shared jobs on the same node.
Alternatively (if at your site jobs must be exclusive) you have to check
what it is their recommended way to perform this. Some sites prefer dask,
some other an MPI-based serial-job consolidation (often called "command
file") some others a technique similar to what you are doing, but instead
of reinventing the wheel I suggest to check what your site recommends in
this situation

On Mon, Aug 19, 2024 at 2:24 AM Arko Roy via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Dear Loris,
>
> I just checked removing the &
> it didn't work.
>
> On Mon, Aug 19, 2024 at 1:43 PM Loris Bennett 
> wrote:
>
>> Dear Arko,
>>
>> Arko Roy  writes:
>>
>> > Thanks Loris and Gareth. here is the job submission script. if you find
>> any errors please let me know.
>> > since i am not the admin but just an user, i think i dont have access
>> to the prolog and epilogue files.
>> >
>> > If the jobs are independent, why do you want to run them all on the same
>> > node?
>> > I am running sequential codes. Essentially 50 copies of the same node
>> with a variation in parameter.
>> > Since I am using the Slurm scheduler, the nodes and cores are allocated
>> depending upon the
>> > available resources. So there are instances, when 20 of them goes to 20
>> free cores located on a particular
>> > node and the rest 30 goes to the free 30 cores on another node. It
>> turns out that only 1 job out of 20 and 1 job
>> > out of 30 are completed succesfully with exitcode 0 and the rest gets
>> terminated with exitcode 9.
>> > for information, i run sjobexitmod -l jobid to check the exitcodes.
>> >
>> > --
>> > the submission script is as follows:
>> >
>> > #!/bin/bash
>> > 
>> > # Setting slurm options
>> > 
>> >
>> > # lines starting with "#SBATCH" define your jobs parameters
>> > # requesting the type of node on which to run job
>> > ##SBATCH --partition 
>> > #SBATCH --partition=standard
>> >
>> > # telling slurm how many instances of this job to spawn (typically 1)
>> >
>> > ##SBATCH --ntasks 
>> > ##SBATCH --ntasks=1
>> > #SBATCH --nodes=1
>> > ##SBATCH -N 1
>> > ##SBATCH --ntasks-per-node=1
>> >
>> > # setting number of CPUs per task (1 for serial jobs)
>> >
>> > ##SBATCH --cpus-per-task 
>> >
>> > ##SBATCH --cpus-per-task=1
>> >
>> > # setting memory requirements
>> >
>> > ##SBATCH --mem-per-cpu 
>> > #SBATCH --mem-per-cpu=1G
>> >
>> > # propagating max time for job to run
>> >
>> > ##SBATCH --time 
>> > ##SBATCH --time 
>> > ##SBATCH --time 
>> > #SBATCH --time 10:0:0
>> > #SBATCH --job-name gstate
>> >
>> > #module load compiler/intel/2018_4
>> > module load fftw-3.3.10-intel-2021.6.0-ppbepka
>> > echo "Running on $(hostname)"
>> > echo "We are in $(pwd)"
>> >
>> > 
>> > # run the program
>> > 
>> > /home/arkoroy.sps.iitmandi/ferro-detun/input1/a_1.out &
>>
>> You should not write
>>
>>   &
>>
>> at the end of the above command.  This will run your program in the
>> background, which will cause the submit script to terminate, which in
>> turn will terminate your job.
>>
>> Regards
>>
>> Loris
>>
>> --
>> Dr. Loris Bennett (Herr/Mr)
>> FUB-IT, Freie Universität Berlin
>>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Print Slurm Stats on Login

2024-08-20 Thread Davide DelVento via slurm-users
Thanks Kevin and Simon,

The full thing that you do is indeed overkill, however I was able to learn
how to collect/parse some of the information I need.

What I am still unable to get is:

- utilization by queue (or list of node names), to track actual use of
expensive resources such as GPUs, high memory nodes, etc
- statistics about wait-in-queue for jobs, due to unavailable resources

hopefully both in a sreport-like format by user and by overall system

I suspect this information is available in sacct, but needs some
massaging/consolidation to become useful for what I am looking for. Perhaps
either (or both) of your scripts already do that in some place that I did
not find? That would be terrific, and I'd appreciate it if you can point me
to its place.

Thanks again!

On Tue, Aug 20, 2024 at 9:09 AM Kevin Broch via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Heavyweight solution (although if you have grafana and prometheus going
> already a little less so):
> https://github.com/rivosinc/prometheus-slurm-exporter
>
> On Tue, Aug 20, 2024 at 12:40 AM Simon Andrews via slurm-users <
> slurm-users@lists.schedmd.com> wrote:
>
>> Possibly a bit more elaborate than you want but I wrote a web based
>> monitoring system for our cluster.  It mostly uses standard slurm commands
>> for job monitoring, but I've also added storage monitoring which requires a
>> separate cron job to run every night.  It was written for our cluster, but
>> probably wouldn't take much work to adapt to another cluster with similar
>> structure.
>>
>> You can see the code and some screenshots at:
>>
>>  https://github.com/s-andrews/capstone_monitor
>>
>> ..and there's a video walk through at:
>>
>> https://vimeo.com/982985174
>>
>> We've also got more friendly scripts for monitoring current and past jobs
>> on the command line.  These are in a private repository as some of the
>> other information there is more sensitive but I'm happy to share those
>> scripts.  You can see the scripts being used in
>> https://vimeo.com/982986202
>>
>> Simon.
>>
>> -Original Message-
>> From: Paul Edmon via slurm-users 
>> Sent: 09 August 2024 16:12
>> To: slurm-users@lists.schedmd.com
>> Subject: [slurm-users] Print Slurm Stats on Login
>>
>> We are working to make our users more aware of their usage. One of the
>> ideas we came up with was to having some basic usage stats printed at login
>> (usage over past day, fairshare, job efficiency, etc). Does anyone have any
>> scripts or methods that they use to do this? Before baking my own I was
>> curious what other sites do and if they would be willing to share their
>> scripts and methodology.
>>
>> -Paul Edmon-
>>
>>
>> --
>> slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe
>> send an email to slurm-users-le...@lists.schedmd.com
>>
>> 
>> This email has been scanned for spam & viruses. If you believe this email
>> should have been stopped by our filters, click the following link to report
>> it (
>> https://portal-uk.mailanyone.net/index.html#/outer/reportspam?token=dXNlcj1zaW1vbi5hbmRyZXdzQGJhYnJhaGFtLmFjLnVrO3RzPTE3MjMyMTY5MzA7dXVpZD02NkI2MzQyMTY5MzU2Q0YwRThDQzI5RTY4MkMxOEY5Mjt0b2tlbj01MjI1ZmJmYzJjODgzNWM3ZDE2ZGRiOTE2ZjIxYzk4MjliMjY2MjA0Ow%3D%3D
>> ).
>>
>> --
>> slurm-users mailing list -- slurm-users@lists.schedmd.com
>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Print Slurm Stats on Login

2024-08-21 Thread Davide DelVento via slurm-users
Thanks, Ole! Your tools and what you do for the community is fantastic, we
all appreciate you!

Of course, I did look (and use) your script. But I need more info.

And no, this is not something that users would run *ever* (let alone at
every login). This is something I *myself* (the cluster administrator) need
to run, once a quarter, or perhaps even just once a year, to inform my
managers of cluster utilization to keep them apprised on the status of the
affairs, and justify change in funding for future hardware purchases. Sorry
for not making this clear, given the initial message I replied to.

Thanks for any suggestion you might have.

On Wed, Aug 21, 2024 at 12:19 AM Ole Holm Nielsen via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hi Davide,
>
> Did you already check out what the slurmacct script can do for you?  See
>
> https://github.com/OleHolmNielsen/Slurm_tools/blob/master/slurmacct/slurmacct
>
> What you're asking for seems like a pretty heavy task regarding system
> resources and Slurm database requests.  You don't imagine this to run
> every time a user makes a login shell?  Some users might run "bash -l"
> inside jobs to emulate a login session, causing a heavy load on your
> servers.
>
> /Ole
>
> On 8/21/24 01:13, Davide DelVento via slurm-users wrote:
> > Thanks Kevin and Simon,
> >
> > The full thing that you do is indeed overkill, however I was able to
> learn
> > how to collect/parse some of the information I need.
> >
> > What I am still unable to get is:
> >
> > - utilization by queue (or list of node names), to track actual use of
> > expensive resources such as GPUs, high memory nodes, etc
> > - statistics about wait-in-queue for jobs, due to unavailable resources
> >
> > hopefully both in a sreport-like format by user and by overall system
> >
> > I suspect this information is available in sacct, but needs some
> > massaging/consolidation to become useful for what I am looking for.
> > Perhaps either (or both) of your scripts already do that in some place
> > that I did not find? That would be terrific, and I'd appreciate it if
> you
> > can point me to its place.
> >
> > Thanks again!
> >
> > On Tue, Aug 20, 2024 at 9:09 AM Kevin Broch via slurm-users
> > mailto:slurm-users@lists.schedmd.com>>
> wrote:
> >
> > Heavyweight solution (although if you have grafana and prometheus
> > going already a little less so):
> > https://github.com/rivosinc/prometheus-slurm-exporter
> > <https://github.com/rivosinc/prometheus-slurm-exporter>
> >
> > On Tue, Aug 20, 2024 at 12:40 AM Simon Andrews via slurm-users
> > mailto:slurm-users@lists.schedmd.com
> >>
> > wrote:
> >
> > Possibly a bit more elaborate than you want but I wrote a web
> > based monitoring system for our cluster.  It mostly uses standard
> > slurm commands for job monitoring, but I've also added storage
> > monitoring which requires a separate cron job to run every
> night.
> > It was written for our cluster, but probably wouldn't take much
> > work to adapt to another cluster with similar structure.
> >
> > You can see the code and some screenshots at:
> >
> > https://github.com/s-andrews/capstone_monitor
> > <https://github.com/s-andrews/capstone_monitor>
> >
> > ..and there's a video walk through at:
> >
> > https://vimeo.com/982985174 <https://vimeo.com/982985174>
> >
> > We've also got more friendly scripts for monitoring current and
> > past jobs on the command line.  These are in a private repository
> > as some of the other information there is more sensitive but I'm
> > happy to share those scripts.  You can see the scripts being used
> > in https://vimeo.com/982986202 <https://vimeo.com/982986202>
> >
> > Simon.
> >
> > -Original Message-
> > From: Paul Edmon via slurm-users  > <mailto:slurm-users@lists.schedmd.com>>
> > Sent: 09 August 2024 16:12
> > To: slurm-users@lists.schedmd.com
> > <mailto:slurm-users@lists.schedmd.com>
> > Subject: [slurm-users] Print Slurm Stats on Login
> >
> > We are working to make our users more aware of their usage. One
> of
> > the ideas we came up with was to having some basic usage stats
> > printed at login (usage over pa

[slurm-users] Re: Print Slurm Stats on Login

2024-08-21 Thread Davide DelVento via slurm-users
Hi Ole,

On Wed, Aug 21, 2024 at 1:06 PM Ole Holm Nielsen via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> The slurmacct script can actually break down statistics by partition,
> which I guess is what you're asking for?  The usage of the command is:
>

Yes, this is almost what I was asking for. And admittedly I now realize
that with perhaps some minor algebra (using the TOTAL-all line) I could get
what I need. What confused me is that running it from everything or one
partition reported the same beginning, rather than a partition-specific
beginning:

[davide ~]$ slurmacct -s 0101 -e 0202
Start date 0101
End date 0202
Report generated to file /tmp/Slurm_report_acct_0101_0202
[davide ~]$  cat /tmp/Slurm_report_acct_0101_0202

Cluster Utilization 01-Jan-2024_00:00 - 01-Feb-2024_23:59
Usage reported in Percentage of Total

  Cluster  Allocated   Down PLND Dow  Idle  Planned   Reported
- -- --  -  --
  cluster 23.25% 67.85%0.00% 8.89%0.01%100.00%

Usage sorted by top users:
(omitted)


[davide ~]$ slurmacct -s 0101 -e 0202
Start date 0101
End date 0202
Print only accounting in Slurm partition gpu
Report generated to file /tmp/Slurm_report_acct_0101_0202
[davide ~]$ cat /tmp/Slurm_report_acct_0101_0202

Cluster Utilization 01-Jan-2024_00:00 - 01-Feb-2024_23:59
Usage reported in Percentage of Total

  Cluster  Allocated   Down PLND Dow  Idle  Planned   Reported
- -- --  -  --
  cluster 23.25% 67.85%0.00% 8.89%0.01%100.00%

Partition selected: gpu
Usage sorted by top users:
(omitted)

Also, what you label "Wallclock hours" in the table of users is actually
core-hours? Not even node-hours, otherwise I am reading things incorrectly.


The Start_time and End_time values specify the date/time interval of
> job completion/termination (see "man sacct").
>
> Hint: Specify Start/End time as MMDD (Month and Date)
>

Small suggestion: change this to

Hint: Specify Start/End time as MMDD (Month and Day) or  as MMDDYY (Month
and Day and Year) since sreport accepts it and your tool appears to
otherwise understand that format.



> >  > - statistics about wait-in-queue for jobs, due to unavailable
> > resources
>
> The slurmacct report prints "Average q-hours" (starttime minus submittime).
>

Ahaha! That's it! Super useful, I was wondering what "q" was
(wait-in-Queue, I guess). You are super.

We use the "topreports" script to gather weekly, monthly and yearly
> reports (using slurmacct) for management (professors at our university).
>

I knew that I must not have been the only one with this need ;-)

Thanks again!

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Slurmdbd purge and reported downtime

2024-08-22 Thread Davide DelVento via slurm-users
I am confused by the reported amount of Down and PLND Down by sreport.
According to it, our cluster would have had a significant amount of
downtime, which I know didn't happen (or, according to the documentation
"time that slurmctld was not responding", see
https://slurm.schedmd.com/sreport.html)

Could it be my purge settings causing this problem? How can I check (maybe
in some logs, maybe in the future) if actually slurmctld was not
responding? The expected long-term numbers should be less than the ones
reported for last month when we had an issue with a few nodes

Thanks!


[davide@login ~]$ grep Purge /opt/slurm/slurmdbd.conf
#JobPurge=12
#StepPurge=1
PurgeEventAfter=1month
PurgeJobAfter=12month
PurgeResvAfter=1month
PurgeStepAfter=1month
PurgeSuspendAfter=1month

[davide@login ~]$ sreport -t percent -T cpu,mem cluster utilization
start=2/1/22

Cluster Utilization 2022-02-01T00:00:00 - 2024-08-21T23:59:59
Usage reported in Percentage of Total

  Cluster  TRES Name   AllocatedDown  PLND Down
Idle   Planned Reported
- -- --- --- --
 - 
  clustercpu  19.50%  12.07%  3.92%
  64.36% 0.15%  100.03%
  clustermem  16.13%  13.17%  4.56%
  66.13% 0.00%   99.99%

[davide@login ~]$sreport -t percent -T cpu,mem cluster utilization
start=2/1/23

Cluster Utilization 2023-02-01T00:00:00 - 2024-08-21T23:59:59
Usage reported in Percentage of Total

  Cluster  TRES Name   AllocatedDown  PLND Down
   Idle   Planned Reported
- -- --- --- --
--- - 
  clustercpu  28.74%  18.80%  6.44%
 45.77% 0.24%  100.02%
  clustermem  22.52%  20.54%  7.38%
 49.55% 0.00%   99.98%

[davide@login ~]$  sreport -t percent -T cpu,mem cluster utilization
start=2/1/24

Cluster Utilization 2024-02-01T00:00:00 - 2024-08-21T23:59:59
Usage reported in Percentage of Total

  Cluster  TRES Name  AllocatedDown  PLND Down
   Idle  PlannedReported
- -- -- --- --
---  ---
  clustercpu 29.92%  24.88% 17.73%
 27.45%0.02% 100.00%
  clustermem 20.07%  28.60% 19.57%
 31.76%0.00% 100.00%

[davide@login ~]$  sreport -t percent -T cpu,mem cluster utilization
start=8/8/24

Cluster Utilization 2024-08-08T00:00:00 - 2024-08-21T23:59:59
Usage reported in Percentage of Total

  Cluster  TRES Name Allocated Down PLND Dow   Idle
 Planned   Reported
- -- -   --
 --
  clustercpu15.96%2.53%0.00% 81.51%
   0.00%100.00%
  clustermem 9.18%2.22%0.00% 88.60%
   0.00%100.00%

[davide@login ~]$  sreport -t percent -T cpu,mem cluster utilization
start=7/7/24

Cluster Utilization 2024-07-07T00:00:00 - 2024-08-21T23:59:59
Usage reported in Percentage of Total

  Cluster  TRES Name  Allocated  Down PLND Dow
Idle  Planned   Reported
- -- -- - 
--  --
  clustercpu 27.07% 2.57%0.00%
70.34%0.02%100.00%
  clustermem 17.35% 2.26%0.00%
80.40%0.00%100.00%

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Slurmdbd purge and reported downtime

2024-08-23 Thread Davide DelVento via slurm-users
Thanks Ole,
this is very helpful. I was unaware of that issue. From the bug report it's
not clear to me if it was just a sreport (display) issue, or if the problem
was in the way the data was stored.

In fact I am running 23.11.5 which I installed in April. The numbers I see
for the last few months (including April) are fine. The earlier numbers
(when I was running an earlier version) are the ones affected by this
problem. So if the issue was the way the data was stored, that explains it
and I can live with it (even if I can't provide an accurate report for my
management now) knowing that the problem won't happen again in the future.

Thanks and have a great weekend

On Fri, Aug 23, 2024 at 8:00 AM Ole Holm Nielsen via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hi Davide,
>
> On 8/22/24 21:30, Davide DelVento via slurm-users wrote:
> > I am confused by the reported amount of Down and PLND Down by sreport.
> > According to it, our cluster would have had a significant amount of
> > downtime, which I know didn't happen (or, according to the documentation
> > "time that slurmctld was not responding", see
> > https://slurm.schedmd.com/sreport.html
> > <https://slurm.schedmd.com/sreport.html>)
> >
> > Could it be my purge settings causing this problem? How can I check
> (maybe
> > in some logs, maybe in the future) if actually slurmctld was not
> > responding? The expected long-term numbers should be less than the ones
> > reported for last month when we had an issue with a few nodes
>
> Which version of Slurm are you using?  There was an sreport bug that
> should be fixed in 23.11:
> https://support.schedmd.com/show_bug.cgi?id=17689
>
> /Ole
>
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Spread a multistep job across clusters

2024-08-26 Thread Davide DelVento via slurm-users
Ciao Fabio,

That for sure is syntactically incorrect, because the way sbatch parsing
works: as soon as it finds a non-empy non-comment line (your first srun) it
will stop parsing for #SBATCH directives. So assuming this is a single file
as it looks from the formatting, the second hetjob and the cluster3 are
ignored. Now, if these are two separate files, they would be two separate
jobs, so that's not going to work either.

More specifically to your question, I can't help because I don't have
experience with federated clusters.

On Mon, Aug 26, 2024 at 9:43 AM Di Bernardini, Fabio via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hi everyone, for accounting reasons, I need to create only one job across
> two or more federated clusters with two or more srun steps.
>
> I’m trying with hetjobs but it's not clear to me from the documentation (
> https://slurm.schedmd.com/heterogeneous_jobs.html) if this is possible
> and how to do it.
>
> I'm trying with this script, but the steps are executed on only the first
> cluster.
>
> Can you tell me if there is a mistake in the hetjob or if it has to be
> done in another way?
>
>
>
> #!/bin/bash
>
>
>
> #SBATCH hetjob
>
> #SBATCH --clusters=cluster2
>
> srun -v --het-group=0 hostname
>
>
>
> #SBATCH hetjob
>
> #SBATCH --clusters=cluster3
>
> srun -v --het-group=1 hostname
>
>
>
> NICE SRL, viale Monte Grappa 3/5, 20124 Milano, Italia, Registro delle
> Imprese di Milano Monza Brianza Lodi REA n. 2096882, Capitale Sociale:
> 10.329,14 EUR i.v., Cod. Fisc. e P.IVA 01133050052, Societa con Socio Unico
>
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Print Slurm Stats on Login

2024-08-28 Thread Davide DelVento via slurm-users
Thanks everybody once again and especially Paul: your job_summary script
was exactly what I needed, served on a golden plate. I just had to
modify/customize the date range and change the following line (I can make a
PR if you want, but it's such a small change that it'd take more time to
deal with the PR than just typing it)

-Timelimit =
time_to_float(Timelimit.replace('UNLIMITED','365-00:00:00'))
+Timelimit =
time_to_float(Timelimit.replace('UNLIMITED','365-00:00:00').replace('Partition_Limit','365-00:00:00'))

Cheers,
Davide


On Tue, Aug 27, 2024 at 1:40 PM Paul Edmon via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> This thread when a bunch of different directions. However I ran with
> Jeffrey's suggestion and wrote up a profile.d script along with other
> supporting scripts to pull the data. The setup I put together is here
> for the community to use as they see fit:
>
> https://github.com/fasrc/puppet-slurm_stats
>
> While this is written as a puppet module the scripts there in can be
> used by anyone as its a pretty straightforward set up and the templates
> have obvious places to do a find and replace.
>
> Naturally I'm happy to take additional merge requests. Thanks for all
> the interesting conversation about this. Lots of great ideas.
>
> -Paul Edmon-
>
> On 8/9/24 12:04 PM, Jeffrey T Frey wrote:
> > You'd have to do this within e.g. the system's bashrc infrastructure.
> The simplest idea would be to add to e.g. /etc/profile.d/zzz-slurmstats.sh
> and have some canned commands/scripts running.  That does introduce load to
> the system and Slurm on every login, though, and slows the startup of login
> shells based on how responsive slurmctld/slurmdbd are at that moment.
> >
> > Another option would be to run the commands/scripts for all users on
> some timed schedule — e.g. produce per-user stats every 30 minutes.  So
> long as the stats are publicly-visible anyway, put those summaries in a
> shared file system with open read access.  Name the files by uid number.
> Now your /etc/profile.d script just cat's ${STATS_DIR}/$(id -u).
> >
> >
> >
> >
> >> On Aug 9, 2024, at 11:11, Paul Edmon via slurm-users <
> slurm-users@lists.schedmd.com> wrote:
> >>
> >> We are working to make our users more aware of their usage. One of the
> ideas we came up with was to having some basic usage stats printed at login
> (usage over past day, fairshare, job efficiency, etc). Does anyone have any
> scripts or methods that they use to do this? Before baking my own I was
> curious what other sites do and if they would be willing to share their
> scripts and methodology.
> >>
> >> -Paul Edmon-
> >>
> >>
> >> --
> >> slurm-users mailing list -- slurm-users@lists.schedmd.com
> >> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Job pre / post submit scripts

2024-10-28 Thread Davide DelVento via slurm-users
Not sure if I understand your use case, but if I do I am not sure if Slurm
provides that functionality.
If it doesn't (and if my understanding is correct), you can still achieve
your goal by:

1) removing sbatch and salloc from user's path
2) writing your own custom scripts named sbatch (and hard/symbolic linked
to salloc) which does what you want and at the end (with $0 variable or
similar thing) it actually invokes full-path to sbatch and salloc --
actually it could do that after, rather than before, if you so prefer

Hope this helps

On Mon, Oct 28, 2024 at 11:59 AM Bhaskar Chakraborty via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hi,
>
> Is there an option in slurm to launch a custom script at the time of job
> submission through sbatch
> or salloc? The script should run with submit user permission in submit
> area.
>
> The idea is that we need to enquire something which characterises our
> job’s requirement like CPU
> slots, memory etc from a central server and we do need read access to user
> area prior to that.
>
> In our use case the user doesn’t necessarily know beforehand what kind of
> resource his job needs.
> (Hence, the need for such a script which will contact the server with user
> area info.)
>
> Based on it we can modify the job a little later. A post submit script, if
> available, would inform us the slurm job id as well, it would get called
> just after the job has entered the system and prior to its scheduling.
>
> Thanks,
> Bhaskar.
>
> Sent from Yahoo Mail for iPad
> 
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Change primary alloc node

2024-10-31 Thread Davide DelVento via slurm-users
Another possible use case of this is a regular MPI job where the
first/controller task often uses more memory than the workers and may need
to be scheduled on a higher memory node than them. I think I saw this
happening in the past, but I'm not 100% sure it was in Slurm or some other
scheduling system and I've lost all its references (and I would be
interested to find if this is possible with Slurm, and if so how)

On Thu, Oct 31, 2024 at 1:10 AM Bhaskar Chakraborty via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hello,
>
> Just to add some context here. We plan to use slurm for developing a sched
> solution which interacts with a backend system.
>
> Now, the backend system has pieces of h/w which require specific host in
> the allocation to be the primary/master host wherein the initial task would
> be launched, this in turn is driven by the job's placement orientation on
> the h/w itself.
>
> So, our primary task should launch in the asked primary host while
> secondary / remote tasks would subsequently get started on other hosts.
>
> Hope this brings some context to the problem as to why a specific host is
> necessary to be the starting host.
>
> Regards,
> Bhaskar.
>
> On Thursday 31 October, 2024 at 12:04:37 am IST, Laura Hild 
> wrote:
>
>
> I think if you tell the list why you care which of the Nodes is BatchHost,
> they may be able to provide you with a better solution.
>
>
> 
>
> Od: Bhaskar Chakraborty via slurm-users 
> Poslano: sreda, 30. oktober 2024 12:35
> Za: slurm-us...@schedmd.com
> Zadeva: [slurm-users] Change primary alloc node
>
> Hi,
>
> Is there a way to change/control the primary node (i.e. where the initial
> task starts) as part of a job's allocation.
>
> For eg, if a job requires 6 CPUs & its allocation is distributed over 3
> hosts h1, h2 & h3 I find that it always starts the task in 1 particular
> node (say h1) irrespective of how many slots were available in the hosts.
>
> Can we somehow let slurm have the primary node as h2?
>
> Is there any C-API inside select plugin which can do this trick if we were
> to control it through the configured select plugin?
>
> Thanks.
> -Bhaskar.
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: errors compiling Slurm 18 on RHEL 9: [Makefile:577: scancel] Error 1 & It's not recommended to have unversioned Obsoletes

2024-09-27 Thread Davide DelVento via slurm-users
Slurm 18? Isn't that a bit outdated?

On Fri, Sep 27, 2024 at 9:41 AM Robert Kudyba via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> We're in the process of upgrading but first we're moving to RHEL 9. My
> attempt to compile using rpmbuild -v -ta --define "_lto_cflags %{nil}"
>  slurm-18.08.9.tar.bz2 (H/T to Brian for this flag
> ).
> I've stumped Google and the Slurm mailing list with the scancel error so
> hoping someone here knows of a work around.
>
> /bin/ld:
> opt.o:/root/rpmbuild/BUILD/slurm-18.08.9/src/scancel/../../src/scancel/scancel.h:78:
> multiple definition of `opt';
> scancel.o:/root/rpmbuild/BUILD/slurm-18.08.9/src/scancel/../../src/scancel/scancel.h:78:
> first defined here
> collect2: error: ld returned 1 exit status
> make[3]: *** [Makefile:577: scancel] Error 1
> make[3]: Leaving directory '/root/rpmbuild/BUILD/slurm-18.08.9/src/scancel'
> make[2]: *** [Makefile:563: all-recursive] Error 1
> make[2]: Leaving directory '/root/rpmbuild/BUILD/slurm-18.08.9/src'
> make[1]: *** [Makefile:690: all-recursive] Error 1
> make[1]: Leaving directory '/root/rpmbuild/BUILD/slurm-18.08.9'
> make: *** [Makefile:589: all] Error 2
> error: Bad exit status from /var/tmp/rpm-tmp.jhiGyR (%build)
>
>
> RPM build errors:
> Macro expanded in comment on line 22: %_prefix path install path for
> commands, libraries, etc.
>
> line 70: It's not recommended to have unversioned Obsoletes:
> Obsoletes: slurm-lua slurm-munge slurm-plugins
> Macro expanded in comment on line 158: %define
> _unpackaged_files_terminate_build  0
>
> line 224: It's not recommended to have unversioned Obsoletes:
> Obsoletes: slurm-sql
> line 256: It's not recommended to have unversioned Obsoletes:
> Obsoletes: slurm-sjobexit slurm-sjstat slurm-seff
> line 275: It's not recommended to have unversioned Obsoletes:
> Obsoletes: pam_slurm
> Bad exit status from /var/tmp/rpm-tmp.jhiGyR (%build)
>
> #!/bin/sh
>
>   RPM_SOURCE_DIR="/root"
>   RPM_BUILD_DIR="/root/rpmbuild/BUILD"
>   RPM_OPT_FLAGS="-O2  -fexceptions -g -grecord-gcc-switches -pipe -Wall
> -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS
> "-Wl,-z,lazy" -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64
> -march=x86-64-v2 -mtune=generic -fasynchronous-unwind-tables
> -fstack-clash-protection -fcf-protection"
>   RPM_LD_FLAGS="-Wl,-z,relro -Wl,--as-needed  "-Wl,-z,lazy"
> -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 "
>   RPM_ARCH="x86_64"
>   RPM_OS="linux"
>   RPM_BUILD_NCPUS="48"
>   export RPM_SOURCE_DIR RPM_BUILD_DIR RPM_OPT_FLAGS RPM_LD_FLAGS RPM_ARCH
> RPM_OS RPM_BUILD_NCPUS RPM_LD_FLAGS
>   RPM_DOC_DIR="/usr/share/doc"
>   export RPM_DOC_DIR
>   RPM_PACKAGE_NAME="slurm"
>   RPM_PACKAGE_VERSION="18.08.9"
>   RPM_PACKAGE_RELEASE="1.el9"
>   export RPM_PACKAGE_NAME RPM_PACKAGE_VERSION RPM_PACKAGE_RELEASE
>   LANG=C
>   export LANG
>   unset CDPATH DISPLAY ||:
>   RPM_BUILD_ROOT="/root/rpmbuild/BUILDROOT/slurm-18.08.9-1.el9.x86_64"
>   export RPM_BUILD_ROOT
>
>
> PKG_CONFIG_PATH="${PKG_CONFIG_PATH}:/usr/lib64/pkgconfig:/usr/share/pkgconfig"
>   export PKG_CONFIG_PATH
>   CONFIG_SITE=${CONFIG_SITE:-NONE}
>   export CONFIG_SITE
>
>   set -x
>   umask 022
>   cd "/root/rpmbuild/BUILD"
> cd 'slurm-18.08.9'
>
>
>   CFLAGS="${CFLAGS:--O2  -fexceptions -g -grecord-gcc-switches -pipe -Wall
> -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS
> "-Wl,-z,lazy" -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64
> -march=x86-64-v2 -mtune=generic -fasynchronous-unwind-tables
> -fstack-clash-protection -fcf-protection}" ; export CFLAGS ;
>   CXXFLAGS="${CXXFLAGS:--O2  -fexceptions -g -grecord-gcc-switches -pipe
> -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2
> -Wp,-D_GLIBCXX_ASSERTIONS "-Wl,-z,lazy"
> -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -march=x86-64-v2
> -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection
> -fcf-protection}" ; export CXXFLAGS ;
>   FFLAGS="${FFLAGS:--O2  -fexceptions -g -grecord-gcc-switches -pipe -Wall
> -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS
> "-Wl,-z,lazy" -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64
> -march=x86-64-v2 -mtune=generic -fasynchronous-unwind-tables
> -fstack-clash-protection -fcf-protection -I/usr/lib64/gfortran/modules}" ;
> export FFLAGS ;
>   FCFLAGS="${FCFLAGS:--O2  -fexceptions -g -grecord-gcc-switches -pipe
> -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2
> -Wp,-D_GLIBCXX_ASSERTIONS "-Wl,-z,lazy"
> -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -march=x86-64-v2
> -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection
> -fcf-protection -I/usr/lib64/gfortran/modules}" ; export FCFLAGS ;
>   LDFLAGS="${LDFLAGS:--Wl,-z,relro -Wl,--as-needed  "-Wl,-z,lazy"
> -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 }" ; export LDFLAGS ;
>   LT_SYS_LIBRARY_PATH="${LT_SYS_LIBRARY_PATH:-/usr/lib64:}" ; expor

[slurm-users] Re: error and output files

2024-12-09 Thread Davide DelVento via slurm-users
Mmmm, from https://slurm.schedmd.com/sbatch.html

> By default both standard output and standard error are directed to a file
of the name "slurm-%j.out", where the "%j" is replaced with the job
allocation number.

Perhaps at your site there's a configuration which uses separate error
files? See the -e option down there at the documenting URL mentioned above.
You can specify the same filename for both output and error to force them
to go into the same file

On Sun, Dec 8, 2024 at 7:39 PM michaelmorgan937--- via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hi all,
>
>
>
> I have a program (for example x.x). When I run it as “x.x -I input -o
> output”, I will get an “output” file, as well as output and error files
> from slurm. However,
> all the content of slurm output file is in the “output” file (and there
> are some extra content), so it is a waste to print the slurm output. Is it
> possible to set
> directives so that slurm only print error file?
>
>
>
> Thank you very much.
>
>
>
> Michael Morgan
>
>
>
>
>
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] formatting node names

2025-01-06 Thread Davide DelVento via slurm-users
Hi all,
I remember seeing on this list a slurm command to change a slurm-friendly
list such as

gpu[01-02],node[03-04,12-22,27-32,36]

into a bash friendly list such as

gpu01
gpu02
node03
node04
node12
etc

I made a note about it but I can't find my note anymore, nor the relevant
message. Can someone please refresh my memory? I'll be more careful with
such a note this time, I promise!

Thanks and happy new year!

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: formatting node names

2025-01-06 Thread Davide DelVento via slurm-users
Found it, I should have asked to my puppet as it's mandatory in some places
:-D
It is simply

scontrol show hostname gpu[01-02],node[03-04,12-22,27-32,36]

Sorry for the noise

On Mon, Jan 6, 2025 at 12:55 PM Davide DelVento 
wrote:

> Hi all,
> I remember seeing on this list a slurm command to change a slurm-friendly
> list such as
>
> gpu[01-02],node[03-04,12-22,27-32,36]
>
> into a bash friendly list such as
>
> gpu01
> gpu02
> node03
> node04
> node12
> etc
>
> I made a note about it but I can't find my note anymore, nor the relevant
> message. Can someone please refresh my memory? I'll be more careful with
> such a note this time, I promise!
>
> Thanks and happy new year!
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: formatting node names

2025-01-07 Thread Davide DelVento via slurm-users
Wonderful. Thanks Ole for the reminder! I had bookmarked your wiki (of
course!) but forgot to check it out in this case. I'll add a more prominent
reminder to self in my notes to always check it!

Happy new year everybody once again

On Tue, Jan 7, 2025 at 1:58 AM Ole Holm Nielsen via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hi,
>
> My 2 cents: I have collected various Slurm hostlist commands in this Wiki
> page:
>
>
> https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_operations/#expanding-and-collapsing-host-lists
>
> Best regards,
> Ole
>
> On 1/7/25 09:25, Steffen Grunewald via slurm-users wrote:
> > On Mon, 2025-01-06 at 12:55:12 -0700, Slurm users wrote:
> >> Hi all,
> >> I remember seeing on this list a slurm command to change a
> slurm-friendly
> >> list such as
> >>
> >> gpu[01-02],node[03-04,12-22,27-32,36]
> >>
> >> into a bash friendly list such as
> >>
> >> gpu01
> >> gpu02
> >> node03
> >> node04
> >> node12
> >> etc
> >
> > I always forget that one as well ("scontrol show hostlist" works in the
> > opposite direction) but I have a workaround at hand:
> >
> > pdsh -w gpu[01-02],node[03-04,12-22,27-32,36] -N -R exec echo %h
> >
> > You may use "-f 1", if you prefer a sorted output.
> > (I use to pipe the output through "xargs" most of the time, too.)
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Job not starting

2024-12-10 Thread Davide DelVento via slurm-users
Good sleuthing.

It would be nice if Slurm would say something like
Reason=Priority_Lower_Than_Job_ so people will immediately find the
culprit in such situations. Has anybody with a SchedMD subscription ever
asked something like that, or is there some reasons for which it'd be
impossible (or too hard) information to gather programmatically?

On Tue, Dec 10, 2024 at 1:09 AM Diego Zuccato via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Found the problem: another job was blocking access to the reservation.
> The strangest thing is that the node (gpu03) has always been reserved
> for a project, the blocking job did not explicitly request it (and even
> if it did, it would have been denied access) but its state was:
> JobState=PENDING Reason=ReqNodeNotAvail,_UnavailableNodes:gpu03
> Dependency=(null)
>
> Paint me surprised...
>
> Diego
>
> Il 07/12/2024 10:03, Diego Zuccato via slurm-users ha scritto:
> > Ciao Davide.
> >
> > Il 06/12/2024 16:42, Davide DelVento ha scritto:
> >
> >> I find it extremely hard to understand situations like this. I wish
> >> Slurm were more clear on how it reported what it is doing, but I
> >> digress...
> > I agree. A "scontrol explain" command could be really useful to pinpont
> > the cause :)
> >
> >> I suspect that there are other job(s) which have higher priority than
> >> this one which are supposed to run on that node but cannot start
> >> because maybe this/these high-priority jobs(s) need(s) several nodes
> >> and the other nodes are not available at the moment?
> > That partition is a single node, and it's IDLE. If another job needed
> > it, it would be in PLANNED state (IIRC).
> >
> >> Pure speculation, obviously, since I have no idea what the rest of
> >> your cluster looks like, and what the rest of the workflow is, but the
> >> clue/ hint is
> >>
> >>  > JobState=PENDING Reason=Priority Dependency=(null)
> >>
> >> You are pending because something else has higher priority. Going back
> >> to my first sentence, I wish Slurm would say which one other job
> >> (maybe there are more than one, but one would suffice for this
> >> investigation) is trumping this job priority so one could more
> >> clearly understand what is going on, without sleuthing.
> > Couldn't agree more :) Scheduler is quite opaque in its decisions. :(
> >
> > Actually the job that the user submitted is not starting and has
> > Reason=PartitionConfig . But QoS 'debug' (the one I'm using for testing)
> > does have higher priority (1000) than QoS 'long' (10, IIRC).
> >
> > Diego
> >
> >> On Fri, Dec 6, 2024 at 7:36 AM Diego Zuccato via slurm-users  >> us...@lists.schedmd.com > wrote:
> >>
> >> Hello all.
> >> An user reported that a job wasn't starting, so I tried to replicate
> >> the
> >> request and I get:
> >> -8<--
> >> [root@ophfe1 root.old]# scontrol show job 113936
> >> JobId=113936 JobName=test.sh
> >>  UserId=root(0) GroupId=root(0) MCS_label=N/A
> >>  Priority=1 Nice=0 Account=root QOS=long
> >>  JobState=PENDING Reason=Priority Dependency=(null)
> >>  Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
> >>  RunTime=00:00:00 TimeLimit=2-00:00:00 TimeMin=N/A
> >>  SubmitTime=2024-12-06T13:19:36 EligibleTime=2024-12-06T13:19:36
> >>  AccrueTime=2024-12-06T13:19:36
> >>  StartTime=Unknown EndTime=Unknown Deadline=N/A
> >>  SuspendTime=None SecsPreSuspend=0
> >> LastSchedEval=2024-12-06T13:21:32
> >> Scheduler=Backfill:*
> >>  Partition=m3 AllocNode:Sid=ophfe1:855189
> >>  ReqNodeList=(null) ExcNodeList=(null)
> >>  NodeList=
> >>  NumNodes=1-1 NumCPUs=96 NumTasks=96 CPUs/Task=1
> >> ReqB:S:C:T=0:0:*:*
> >>  TRES=cpu=96,mem=95000M,node=1,billing=1296
> >>  Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
> >>  MinCPUsNode=1 MinMemoryNode=95000M MinTmpDiskNode=0
> >>  Features=(null) DelayBoot=00:00:00
> >>  OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
> >>  Command=/home/root.old/test.sh
> >>  WorkDir=/home/root.old
> >>  StdErr=/home/root.old/%N-%J.err
> >>  StdIn=/dev/null
> >>  StdOut=/home/root.old/%N-%J.out
> >>  Power=
> >>
> >>
> >> [root@ophfe1 root.old]# scontrol sho partition m3
> >> PartitionName=m3
> >>  AllowGroups=ALL DenyAccounts=formazione AllowQos=ALL
> >>  AllocNodes=ALL Default=NO QoS=N/A
> >>  DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO
> GraceTime=0
> >> Hidden=NO
> >>  MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=0 LLN=NO
> >> MaxCPUsPerNode=UNLIMITED
> >>  Nodes=mtx20
> >>  PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO
> >> OverSubscribe=NO
> >>  OverTimeLimit=NONE PreemptMode=CANCEL
> >>  State=UP TotalCPUs=192 TotalNodes=1
> >> SelectTypeParameters=CR_SOCKET_MEMORY

[slurm-users] Re: Job not starting

2024-12-06 Thread Davide DelVento via slurm-users
Ciao Diego,
I find it extremely hard to understand situations like this. I wish Slurm
were more clear on how it reported what it is doing, but I digress...

I suspect that there are other job(s) which have higher priority than this
one which are supposed to run on that node but cannot start because maybe
this/these high-priority jobs(s) need(s) several nodes and the other nodes
are not available at the moment?
Pure speculation, obviously, since I have no idea what the rest of your
cluster looks like, and what the rest of the workflow is, but the clue/hint
is

> JobState=PENDING Reason=Priority Dependency=(null)

You are pending because something else has higher priority. Going back to
my first sentence, I wish Slurm would say which one other job (maybe there
are more than one, but one would suffice for this investigation) is
trumping this job priority so one could more clearly understand what is
going on, without sleuthing.

HTH


On Fri, Dec 6, 2024 at 7:36 AM Diego Zuccato via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hello all.
> An user reported that a job wasn't starting, so I tried to replicate the
> request and I get:
> -8<--
> [root@ophfe1 root.old]# scontrol show job 113936
> JobId=113936 JobName=test.sh
> UserId=root(0) GroupId=root(0) MCS_label=N/A
> Priority=1 Nice=0 Account=root QOS=long
> JobState=PENDING Reason=Priority Dependency=(null)
> Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
> RunTime=00:00:00 TimeLimit=2-00:00:00 TimeMin=N/A
> SubmitTime=2024-12-06T13:19:36 EligibleTime=2024-12-06T13:19:36
> AccrueTime=2024-12-06T13:19:36
> StartTime=Unknown EndTime=Unknown Deadline=N/A
> SuspendTime=None SecsPreSuspend=0 LastSchedEval=2024-12-06T13:21:32
> Scheduler=Backfill:*
> Partition=m3 AllocNode:Sid=ophfe1:855189
> ReqNodeList=(null) ExcNodeList=(null)
> NodeList=
> NumNodes=1-1 NumCPUs=96 NumTasks=96 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
> TRES=cpu=96,mem=95000M,node=1,billing=1296
> Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
> MinCPUsNode=1 MinMemoryNode=95000M MinTmpDiskNode=0
> Features=(null) DelayBoot=00:00:00
> OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
> Command=/home/root.old/test.sh
> WorkDir=/home/root.old
> StdErr=/home/root.old/%N-%J.err
> StdIn=/dev/null
> StdOut=/home/root.old/%N-%J.out
> Power=
>
>
> [root@ophfe1 root.old]# scontrol sho partition m3
> PartitionName=m3
> AllowGroups=ALL DenyAccounts=formazione AllowQos=ALL
> AllocNodes=ALL Default=NO QoS=N/A
> DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0
> Hidden=NO
> MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=0 LLN=NO
> MaxCPUsPerNode=UNLIMITED
> Nodes=mtx20
> PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO
> OverSubscribe=NO
> OverTimeLimit=NONE PreemptMode=CANCEL
> State=UP TotalCPUs=192 TotalNodes=1
> SelectTypeParameters=CR_SOCKET_MEMORY
> JobDefaults=(null)
> DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
> TRES=cpu=192,mem=115M,node=1,billing=2592
> TRESBillingWeights=CPU=13.500,Mem=2.2378G
>
> [root@ophfe1 root.old]# scontrol show node mtx20
> NodeName=mtx20 Arch=x86_64 CoresPerSocket=24
> CPUAlloc=0 CPUEfctv=192 CPUTot=192 CPULoad=0.00
> AvailableFeatures=ib,matrix,intel,avx
> ActiveFeatures=ib,matrix,intel,avx
> Gres=(null)
> NodeAddr=mtx20 NodeHostName=mtx20 Version=22.05.6
> OS=Linux 4.18.0-372.9.1.el8.x86_64 #1 SMP Tue May 10 14:48:47 UTC 2022
> RealMemory=115 AllocMem=0 FreeMem=1156606 Sockets=4 Boards=1
> MemSpecLimit=2048
> State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=8 Owner=N/A MCS_label=N/A
> Partitions=m3
> BootTime=2024-12-06T10:01:42 SlurmdStartTime=2024-12-06T10:02:54
> LastBusyTime=2024-12-06T10:51:58
> CfgTRES=cpu=192,mem=115M,billing=2592
> AllocTRES=
> CapWatts=n/a
> CurrentWatts=0 AveWatts=0
> ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
> -8<--
>
> So the node is free, the partition does not impose extra limits (used
> only for accounting factors) but the job does not start.
>
> Any hints?
>
> Tks
>
> --
> Diego Zuccato
> DIFA - Dip. di Fisica e Astronomia
> Servizi Informatici
> Alma Mater Studiorum - Università di Bologna
> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> tel.: +39 051 20 95786
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Unexpected node got allocation

2025-01-09 Thread Davide DelVento via slurm-users
I believe in absence of other reasons, slurm assigns nodes to jobs in the
order they are listed in the partition definitions of slurm.conf -- perhaps
for whatever reason the node 41 appears first there, rather than 01?

On Thu, Jan 9, 2025 at 7:24 AM Dan Healy via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> No, sadly there’s no topology.conf in use.
>
> Thanks,
>
> Daniel Healy
>
>
> On Thu, Jan 9, 2025 at 8:28 AM Steffen Grunewald <
> steffen.grunew...@aei.mpg.de> wrote:
>
>> On Thu, 2025-01-09 at 07:51:40 -0500, Slurm users wrote:
>> > Hello there and good morning from Baltimore.
>> >
>> > I have a small cluster with 100 nodes. When the cluster is completely
>> empty
>> > of all jobs, the first job gets allocated to node 41. In other clusters,
>> > the first job gets allocated to mode 01. If I specify node 01, the
>> > allocation works perfectly. I have my partition NodeName set as
>> > node[01-99], so having node41 used first is a surprise to me. We also
>> have
>> > many other partitions which start with node41, but the partition being
>> used
>> > for the allocation starts with node01.
>> >
>> > Does anyone know what would cause this?
>>
>> Just a wild guess, but do you have a topology.conf file that somehow makes
>> this node look most reasonable to use for a single-node job?
>> (Topology attempts to assign, or hold back, sections of your network to
>> maximize interconnect bandwidth for multi-node jobs. Your node41 might be
>> one - or the first one of a series - that would leave bigger chunks unused
>> for bigger tasks.)
>>
>> HTH,
>>  Steffen
>>
>> --
>> Steffen Grunewald, Cluster Administrator
>> Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
>> Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany
>> ~~~
>> Fon: +49-331-567 7274
>> Mail: steffen.grunewald(at)aei.mpg.de
>> ~~~
>>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Create filenames based on slurm hosts

2025-02-14 Thread Davide DelVento via slurm-users
Not sure I completely understand what you need, but if I do... How about

touch whatever_prefix_$(scontrol show hostname whatever_list)

where whatever_list could be your $SLURM_JOB_NODELIST ?

On Fri, Feb 14, 2025 at 9:42 AM John Hearns via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> I am working on power logging of a GPU cluster I am working with.
> I am running jobs on multiple hosts.
> I wanst to create a file , one for each host, which has a unique filename
> containing the host name.
> Something like
> clush  -w $SLURM_JOB_NODELIST "touch file$(hostname)"
>
> My foo is weak today. Help me Ole Wan Neilsen or any $Jedi
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Create filenames based on slurm hosts

2025-02-14 Thread Davide DelVento via slurm-users
Actually I hit sent too quickly, what I meant (assuming bash) is

for a in $(scontrol show hostname whatever_list); do touch $a; done

with the same whatever_list being $SLURM_JOB_NODELIST

On Fri, Feb 14, 2025 at 1:18 PM Davide DelVento 
wrote:

> Not sure I completely understand what you need, but if I do... How about
>
> touch whatever_prefix_$(scontrol show hostname whatever_list)
>
> where whatever_list could be your $SLURM_JOB_NODELIST ?
>
> On Fri, Feb 14, 2025 at 9:42 AM John Hearns via slurm-users <
> slurm-users@lists.schedmd.com> wrote:
>
>> I am working on power logging of a GPU cluster I am working with.
>> I am running jobs on multiple hosts.
>> I wanst to create a file , one for each host, which has a unique filename
>> containing the host name.
>> Something like
>> clush  -w $SLURM_JOB_NODELIST "touch file$(hostname)"
>>
>> My foo is weak today. Help me Ole Wan Neilsen or any $Jedi
>>
>> --
>> slurm-users mailing list -- slurm-users@lists.schedmd.com
>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>>
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: slurmrestd equivalent to "srun -n 10 echo HELLO"

2025-03-24 Thread Davide DelVento via slurm-users
If you submit the command as a script, the output and the error stream end
up in files, because you may logout, or have gazillion of other things or
other reasons, and therefore the stream to tty/console does not make sense
anymore

On Mon, Mar 24, 2025 at 8:29 AM Dan Healy via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hi Slurm Community,
>
> I'm starting to experiment with slurmrestd for a new app we're writing.
> I'm having trouble understanding one aspect of submitting jobs.
>
> When I run something like `srun -n 10 echo HELLO', I get HELLO returned to
> my console/stdout 10x.
> When I submit this command as a script to the /jobs/submit route, I get
> success/200, but *I cannot determine how to get the console output of
> HELLO 10x in any form*. It's not in my stdout log for that job even
> though I can verify that the job ran successfully.
>
> Any suggestions?
>
> --
> Thanks,
>
> Daniel Healy
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Slurm 24.05 and OpenMPI

2025-03-26 Thread Davide DelVento via slurm-users
Hi Matthias,
Let's take the simplest things out first: have you compiled OpenMPI
yourself, separately on both clusters, using the specific drivers for
whatever network you have on each? In my experience OpenMPI is quite
finicky about working correctly, unless you do that. And when I don't, I
see exactly that error -- heck sometimes I see that even when OpenMPI is
(supposed?) to be compiled and linked correctly and in such cases I resolve
it by starting jobs with "mpirun --mca smsc xpmem -n $tasks
whatever-else-you-need" (which obviously may or may not be relevant for
your case).
Cheers,
Davide

On Wed, Mar 26, 2025 at 12:51 PM Matthias Leopold via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hi,
>
> I built a small Slurm 21.08 cluster with NVIDIA GPU hardware and NVIDIA
> deepops framework a couple of years ago. It is based on Ubuntu 20.04 and
> makes use of the NVIDIA pyxis/enroot container solution. For operational
> validation I used the nccl-tests application in a container. nccl-tests
> is compiled with MPI support (OpenMPI 4.1.6 or 4.1.7) and I used it also
> for validation of MPI jobs. Slurm jobs use "pmix" and tasks are launched
> via srun (not mpirun). Some of the GPUs can talk to each other via
> Infiniband, but MPI is rarely used at our site and I'm fully aware that
> my MPI knowledge is very limited. Still it worked with Slurm 21.08.
>
> Now I built a Slurm 24.05 cluster based on Ubuntu 24.04 and started to
> move hardware there. When I run my nccl-tests container (also with newer
> software) I see error messages like this:
>
> [node1:21437] OPAL ERROR: Unreachable in file ext3x_client.c at line 111
> --
> The application appears to have been direct launched using "srun",
> but OMPI was not built with SLURM's PMI support and therefore cannot
> execute. There are several options for building PMI support under
> SLURM, depending upon the SLURM version you are using:
>
>version 16.05 or later: you can use SLURM's PMIx support. This
>requires that you configure and build SLURM --with-pmix.
>
>Versions earlier than 16.05: you must use either SLURM's PMI-1 or
>PMI-2 support. SLURM builds PMI-1 by default, or you can manually
>install PMI-2. You must then build Open MPI using --with-pmi pointing
>to the SLURM PMI library location.
>
> Please configure as appropriate and try again.
> --
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***and potentially your MPI job)
> [node1:21437] Local abort before MPI_INIT completed completed
> successfully, but am not able to aggregate error messages, and not able
> to guarantee that all other processes were killed!
>
> One simple question:
> Is this related to https://github.com/open-mpi/ompi/issues/12471?
> If so: is there some workaround?
>
> I'm very grateful for any comments. I know that a lot of detail
> information is missing, but maybe someone can still already give me a
> hint where to look.
>
> Thanks a lot
> Matthias
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: [EXTERNAL] Re: [EXTERN] Re: Slurm 24.05 and OpenMPI

2025-03-27 Thread Davide DelVento via slurm-users
{
  "emoji": "♥️",
  "version": 1
}
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: cpus and gpus partitions and how to optimize the resource usage

2025-04-01 Thread Davide DelVento via slurm-users
Yes, I think so, but that should be no problem. I think that requires your
Slurm was built using the --enable-multiple-slurmd configure option, so you
might need to rebuild Slurm, if you didn't use that option in the first
place.

On Mon, Mar 31, 2025 at 7:32 AM Massimo Sgaravatto <
massimo.sgarava...@gmail.com> wrote:

> Hi Davide
> Thanks for your feedback
>
> If  gpu01 and cpusingpu01 are physically the same node, doesn't this mean
> that I have to start 2 slurmd on that node (one with "slurmd -N gpu01" and
> one with "slurmd -N cpusingpu01") ?
>
>
> Thanks, Massimo
>
>
> On Mon, Mar 31, 2025 at 3:22 PM Davide DelVento 
> wrote:
>
>> Ciao Massimo,
>> How about creating another queue cpus_in_the_gpu_nodes (or something less
>> silly) which targets the GPU nodes but does not allow the allocation of the
>> GPUs with gres and allocates 96-8 (or whatever other number you deem
>> appropriate) of the CPUs (and similarly with memory)? Actually it could
>> even be the same "onlycpus" queue, just on different nodes.
>>
>> In fact, in Slurm you declare the cores (and sockets) in a node-based,
>> not queue-based, fashion. But you can set up an alias for those nodes with
>> a second name and use such a second name in the way described above. I am
>> not aware (and I have not searched for) Slurm be able to understand such a
>> situation on its own and therefore you will have to manually avoid "double
>> booking". One way of doing that could be to configure the nodes with their
>> first name in a way that Slurm thinks they have less resources. So for
>> example in slurm.conf
>>
>> NodeName=gpu[01-06] CoresPerSocket=4 RealMemory=whatever1 Sockets=2
>> ThreadsPerCore=1 Weight=1 State=UNKNOWN Gres=gpu:h100:4
>> NodeName=cpusingpu[01-06] CoresPerSocket=44 RealMemory=whatever2
>> Sockets=2 ThreadsPerCore=1 Weight=1 State=UNKNOWN
>>
>> where gpuNN and cpusingpuNN are physically the same node and whatever1 +
>> whatever2 is the actual maximum amount of memory you want Slurm to
>> allocate. And you will also want to make sure the Weight are such that the
>> non-GPU nodes get used first.
>>
>> Disclaimer: I'm thinking out loud, I have not tested this in practice,
>> there may be something I overlooked.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Mar 31, 2025 at 5:12 AM Massimo Sgaravatto via slurm-users <
>> slurm-users@lists.schedmd.com> wrote:
>>
>>> Dear all
>>>
>>>
>>>
>>> We have just installed a small SLURM cluster composed of 12 nodes:
>>>
>>> - 6 CPU only nodes: 2 Sockets=2, 96 CoresPerSocket 2, ThreadsPerCore=2,
>>> 1.5 TB of RAM
>>> - 6 nodes with also GPUS: same conf of the CPU-only node + 4 H100 per
>>> node
>>>
>>>
>>> We started with a setup with 2 partitions:
>>>
>>> - a 'onlycpus' partition which sees all the cpu-only nodes
>>> - a 'gpus' partition which sees the nodes with gpus
>>>
>>> and asked users to use the 'gpus' partition only for jobs that need gpus
>>> (for the time being we are not technically enforced that).
>>>
>>>
>>> The problem is that a job requiring a GPU usually needs only a few cores
>>> and only a few GB of RAM, which means wasting a lot of CPU cores.
>>> And having all nodes in the same partition would mean that there is the
>>> risk that a job requiring a GPU can't start if all CPU cores and/or all
>>> memory is used by CPU only jobs
>>>
>>>
>>> I went through the mailing list archive and I think that "splitting" a
>>> GPU node into two logical nodes (one to be used in the 'gpus' partition and
>>> one to be used in the 'onlycpus' partition) as discussed in [*] would help.
>>>
>>>
>>> Since that proposed solution is considered by his author a "bit of a
>>> kludge" and since I read that splitting a node into multiple logical nodes
>>> is in a general a bad idea, I'd like to understand if you could suggest
>>> other/best options.
>>>
>>>
>>> I also found this [**] thread, but I don't like too much that approach
>>> (i.e. relying on MaxCPUsPerNode) because it would mean having 3 partition
>>> (if I have got it right): two partitions for cpu only jobs and 1 partition
>>> for gpu jobs
>>>
>>>
>>> Many thanks, Massimo
>>>
>>>
>>> [*] https://groups.google.com/g/slurm-users/c/IUd7jLKME3M
>>> [**] https://groups.google.com/g/slurm-users/c/o7AiYAQ1YJ0
>>>
>>> --
>>> slurm-users mailing list -- slurm-users@lists.schedmd.com
>>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>>>
>>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: [EXTERNAL] Re: [EXTERN] Re: Slurm 24.05 and OpenMPI

2025-03-28 Thread Davide DelVento via slurm-users
{
  "emoji": "👍",
  "version": 1
}
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Preemption question

2025-03-30 Thread Davide DelVento via slurm-users
Hi Kamil,

I don't use QoS, so I don't have a direct answer to your question, however
I use preemption for a queue/partition and that is extremely easy to set up
and maintain. In case you plan with QoS won't work, you can set up a
preemptable queue and force this user to submit only to this queue and that
might be adequate for your needs.

Cheers,
Davide

On Sun, Mar 30, 2025 at 7:43 AM Kamil Wilczek via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Dear All,
>
> I would like to be able to preempt (SUSPEND) a single QoS of a user
> that blocks the queue for several days. Currently I have about
> 100 users on the cluster and it seems that setting the "Preempt"
> option to each QoS (we have personal QoSes) is not optimal.
>
>https://slurm.schedmd.com/sacctmgr.html#OPT_Preempt
>
> Is there a way to to set an option to this single problematic
> QoS, saying that the QoS can be preempted by any other QoS?
> It would be much more administrator-friendly solution ;)
>
> Kind regards
> --
> Kamil Wilczek [https://keys.openpgp.org/]
> [D415917E84B8DA5A60E853B6E676ED061316B69B]
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Slurm webhooks

2025-04-23 Thread Davide DelVento via slurm-users
Thank you all. I had thought of writing my own, but I suspected it would be
too large of a time sink. Your nudges (and example script) have convinced
me otherwise, and in fact this is what I will do!
Thanks again!

On Tue, Apr 22, 2025 at 3:12 AM Bjørn-Helge Mevik via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Davide DelVento via slurm-users  writes:
>
> > I've gotten a request to have Slurm notify users for the typical email
> > things (job started, completed, failed, etc) with a REST API instead of
> > email. This would allow notifications in MS Teams, Slack, or log stuff in
> > some internal websites and things like that.
>
> We are just in the process of implementing this on one of our clusters.
> (The ReST API is already there, what we are implementing is Slurm using
> it instead of sending emails.)  For us, it is quite easy: Simply write a
> bash script that uses SLURM_* environment variables to get information
> about the message and user, and then uses curl to issue the required
> ReST API calls.  Then we set the MailProg parameter in slurm.conf to
> point to this script.
>
> Here is our current test version for this script (so far, it simply logs
> what it would do instead of actually contacting the ReST API, together
> with some debug output):
>
> #!/bin/bash
>
> exec &>> /tmp/mail.log
>
> echo $(date +%FT%T): Starting
> SUBJECT="$2"
> echo Args:
> while [[ $# > 0 ]]; do
> echo "$1"
> shift
> done
> echo
> echo Envs:
> env | grep SLURM | sort
> echo
>
> case $SLURM_JOB_MAIL_TYPE in
> Began) ACTION="started";;
> Ended) if [[ $SLURM_JOB_STATE == COMPLETED ]]; then
>ACTION="completed"
>elif [[ $SLURM_JOB_STATE == CANCELLED ]]; then
>ACTION="been cancelled"
>else
>ACTION="ended"
>fi;;
> Requeued) ACTION="been requeued";;
> *) ACTION="unknwon action";;
> esac
>
> BODY="Your job $SLURM_JOB_ID ($SLURM_JOB_NAME) on $SLURM_CLUSTER_NAME has
> $ACTION.
> "
>
> echo Recipient: $SLURM_JOB_USER
> echo Subject: $SUBJECT
> echo Body:
> echo $BODY
> echo
> echo Done.
>
> --
> Regards,
> Bjørn-Helge Mevik, dr. scient,
> Department for Research Computing, University of Oslo
>
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Can't specify multiple partitions when submitting GPU jobs

2025-04-29 Thread Davide DelVento via slurm-users
Perhaps some of the partition's default (maybe even implicit) are to blame?

On Mon, Apr 28, 2025 at 7:56 AM milad--- via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Update: I also noticed that specifying -ntasks makes a difference when
> --gpus is present.
>
> if I have two partitions a100 and h100 that both have free GPUs:
>
> ✅ h100 specified first in -p: works
> sbatch -p h100,a100 --gpus h100:1 script.sh
>
> ❌ h100 specified second: doesn't work
> sbatch -p a100,h100 --gpus h100:1 script.sh
>
> Adding --ntasks: works
> ✅ sbatch -p a100,h100 --gpus h100:1 --ntasks 1 script.sh
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Slurm webhooks

2025-04-21 Thread Davide DelVento via slurm-users
Happy Monday everybody,

I've gotten a request to have Slurm notify users for the typical email
things (job started, completed, failed, etc) with a REST API instead of
email. This would allow notifications in MS Teams, Slack, or log stuff in
some internal websites and things like that.

As far as I can tell, Slurm does not support that, for example there was
somebody who was looking for that on Galaxy and did not find a solution:
https://help.galaxyproject.org/t/web-hook-post-to-external-url-when-job-begins-and-completes-running/4017
Is that indeed the case, as searching the web indicates?

If Slurm does not support this, is there a workaround? For example, I'm
thinking of installing a local SMTP server, or an alternative/dummy mailx
program which instead of relaying emails as requested would post an
encrypted web url thing using the information from the email. I am sure I
could actually write such a software myself, but I don't have enough time
to dedicate to the design, maintenance and debugging of such, so I am
looking for something decent already in existence. A cursory web search did
not find anything suitable, but perhaps I did not look in the appropriate
places, because my gut feeling is that somebody must have already had such
an itch to scratch!

Any other ideas about alternative ways to accomplish this?

Thanks

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: SLURM_JOB_ACCOUNT var missing in prolog

2025-03-13 Thread Davide DelVento via slurm-users
I am not sure about that one variable, however I gave up on using
environmental variables in the prolog for the reasons described in an
earlier thread at the following link

https://groups.google.com/g/slurm-users/c/R9adbpdZ22E/m/cZAkDIS5AAAJ

On Wed, Mar 12, 2025 at 3:36 AM Jonás Arce via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hi everyone,
>
> The Slurm variable SLURM_JOB_ACCOUNT is blank in my prolog, currently i
> have a TaskProlog and a Prolog in my system, in TaskProlog this variable
> works just fine (I use TaskProlog to set env variables), but in my Prolog
> (which I use to create temporal directories), it's just blank, it's strange
> beacuse other variables such as SLURM_JOB_PARTITION, SLURM_JOB_ACCOUNT or
> SLURM_JOBID work just fine in both Prolog and TaskProlog, I looked into the
> Slurm doc and it seems that this var should work everywhere, if someone
> could shed some light into this or tell me another equivalent var i'd
> appreciate it a lot.
> I need it because I need to make this type of temporal directories with my
> Prolog: /scratch-global/$SLURM_JOB_ACCOUNT/$SLURM_JOB_USER/$SLURM_JOBID.
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Limit CPUs per job (but not per user, partition or node)

2025-02-26 Thread Davide DelVento via slurm-users
Hi Herbert,
I believe the limit is per node (not per partition) whereas you want it per
job. In other words, your users will be able to run jobs on other nodes.

There is no MaxCPUsPerJob option in the partition definition, but I believe
you can make that restriction in other ways (at worst with a job_submit.lua
but I think there would be a simpler way).

Sorry not an answer, but hopefully a little nudge toward the solution
Davide

On Wed, Feb 26, 2025 at 11:37 AM Herbert Fruchtl via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> We have a cluster with multi-core nodes (168) that can be shared by
> multiple jobs at the same time. How do I configure a partition such that
> it only accepts jobs requesting up to (say) 8 cores, but will run
> multiple jobs at the same time? The following is apparently not working:
>
> PartitionName=debug Nodes=node01 MaxTime=02:00:00 DefMemPerCPU=1000
> MaxCPUsPerNode=8 Default=NO
>
> It allows one job using 8 cores, but a second one will not start because
> the limit is apparently for the partition as a whole.
>
> Thanks in advance,
>
>Herbert
> --
> Herbert Fruchtl (he/him)
> Senior Scientific Computing Officer / HPC Administrator
> School of Chemistry, IT Services
> University of St Andrews
> --
> The University of St Andrews is a charity registered in Scotland:
> No SC013532
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: cpus and gpus partitions and how to optimize the resource usage

2025-04-04 Thread Davide DelVento via slurm-users
Ciao Massimo,
How about creating another queue cpus_in_the_gpu_nodes (or something less
silly) which targets the GPU nodes but does not allow the allocation of the
GPUs with gres and allocates 96-8 (or whatever other number you deem
appropriate) of the CPUs (and similarly with memory)? Actually it could
even be the same "onlycpus" queue, just on different nodes.

In fact, in Slurm you declare the cores (and sockets) in a node-based, not
queue-based, fashion. But you can set up an alias for those nodes with a
second name and use such a second name in the way described above. I am not
aware (and I have not searched for) Slurm be able to understand such a
situation on its own and therefore you will have to manually avoid "double
booking". One way of doing that could be to configure the nodes with their
first name in a way that Slurm thinks they have less resources. So for
example in slurm.conf

NodeName=gpu[01-06] CoresPerSocket=4 RealMemory=whatever1 Sockets=2
ThreadsPerCore=1 Weight=1 State=UNKNOWN Gres=gpu:h100:4
NodeName=cpusingpu[01-06] CoresPerSocket=44 RealMemory=whatever2 Sockets=2
ThreadsPerCore=1 Weight=1 State=UNKNOWN

where gpuNN and cpusingpuNN are physically the same node and whatever1 +
whatever2 is the actual maximum amount of memory you want Slurm to
allocate. And you will also want to make sure the Weight are such that the
non-GPU nodes get used first.

Disclaimer: I'm thinking out loud, I have not tested this in practice,
there may be something I overlooked.

















On Mon, Mar 31, 2025 at 5:12 AM Massimo Sgaravatto via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Dear all
>
>
>
> We have just installed a small SLURM cluster composed of 12 nodes:
>
> - 6 CPU only nodes: 2 Sockets=2, 96 CoresPerSocket 2, ThreadsPerCore=2,
> 1.5 TB of RAM
> - 6 nodes with also GPUS: same conf of the CPU-only node + 4 H100 per node
>
>
> We started with a setup with 2 partitions:
>
> - a 'onlycpus' partition which sees all the cpu-only nodes
> - a 'gpus' partition which sees the nodes with gpus
>
> and asked users to use the 'gpus' partition only for jobs that need gpus
> (for the time being we are not technically enforced that).
>
>
> The problem is that a job requiring a GPU usually needs only a few cores
> and only a few GB of RAM, which means wasting a lot of CPU cores.
> And having all nodes in the same partition would mean that there is the
> risk that a job requiring a GPU can't start if all CPU cores and/or all
> memory is used by CPU only jobs
>
>
> I went through the mailing list archive and I think that "splitting" a GPU
> node into two logical nodes (one to be used in the 'gpus' partition and one
> to be used in the 'onlycpus' partition) as discussed in [*] would help.
>
>
> Since that proposed solution is considered by his author a "bit of a
> kludge" and since I read that splitting a node into multiple logical nodes
> is in a general a bad idea, I'd like to understand if you could suggest
> other/best options.
>
>
> I also found this [**] thread, but I don't like too much that approach
> (i.e. relying on MaxCPUsPerNode) because it would mean having 3 partition
> (if I have got it right): two partitions for cpu only jobs and 1 partition
> for gpu jobs
>
>
> Many thanks, Massimo
>
>
> [*] https://groups.google.com/g/slurm-users/c/IUd7jLKME3M
> [**] https://groups.google.com/g/slurm-users/c/o7AiYAQ1YJ0
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: [EXTERN] Re: Slurm 24.05 and OpenMPI

2025-03-27 Thread Davide DelVento via slurm-users
Hi Matthias,
I see. It does not freak me out. Unfortunately I have very little
experience working with MPI-in-containers, so I don't know the best way to
debug this.
What I do know is that some ABIs in Slurm change with Slurm major versions
and dependencies need to be recompiled with newer versions of the latter.
So maybe trying to recompile the OpenMPI-inside-the-container against the
version of Slurm you are utilizing is the first I would try if I were in
your shoes
Best,
Davide

On Thu, Mar 27, 2025 at 4:19 AM Matthias Leopold <
matthias.leop...@meduniwien.ac.at> wrote:

> Hi Davide,
>
> thanks for reply.
> In my clusters OpenMPI is not present on the compute nodes. The
> application (nccl-tests) is compiled inside the container against
> OpenMPI. So when I run the same container in both clusters it's
> effectively the exact same OpenMPI version. I hope you don't freak out
> hearing this, but this worked with Slurm 21.08. I tried using a newer
> container version and another OpenMPI (first it was Ubuntu 20.04 with
> OpenMPI 4.1.7 from NVIDIA repo, second is Ubuntu 24.04 with Ubuntu
> OpenMPI 4.1.6), but the error is the same when running the container in
> Slurm 24.05.
>
> Matthias
>
> Am 26.03.25 um 21:24 schrieb Davide DelVento:
> > Hi Matthias,
> > Let's take the simplest things out first: have you compiled OpenMPI
> > yourself, separately on both clusters, using the specific drivers for
> > whatever network you have on each? In my experience OpenMPI is quite
> > finicky about working correctly, unless you do that. And when I don't, I
> > see exactly that error -- heck sometimes I see that even when OpenMPI is
> > (supposed?) to be compiled and linked correctly and in such cases I
> > resolve it by starting jobs with "mpirun --mca smsc xpmem -n $tasks
> > whatever-else-you-need" (which obviously may or may not be relevant for
> > your case).
> > Cheers,
> > Davide
> >
> > On Wed, Mar 26, 2025 at 12:51 PM Matthias Leopold via slurm-users
> > mailto:slurm-users@lists.schedmd.com>>
> > wrote:
> >
> > Hi,
> >
> > I built a small Slurm 21.08 cluster with NVIDIA GPU hardware and
> NVIDIA
> > deepops framework a couple of years ago. It is based on Ubuntu 20.04
> > and
> > makes use of the NVIDIA pyxis/enroot container solution. For
> > operational
> > validation I used the nccl-tests application in a container.
> nccl-tests
> > is compiled with MPI support (OpenMPI 4.1.6 or 4.1.7) and I used it
> > also
> > for validation of MPI jobs. Slurm jobs use "pmix" and tasks are
> > launched
> > via srun (not mpirun). Some of the GPUs can talk to each other via
> > Infiniband, but MPI is rarely used at our site and I'm fully aware
> that
> > my MPI knowledge is very limited. Still it worked with Slurm 21.08.
> >
> > Now I built a Slurm 24.05 cluster based on Ubuntu 24.04 and started
> to
> > move hardware there. When I run my nccl-tests container (also with
> > newer
> > software) I see error messages like this:
> >
> > [node1:21437] OPAL ERROR: Unreachable in file ext3x_client.c at line
> 111
> >
>  --
> > The application appears to have been direct launched using "srun",
> > but OMPI was not built with SLURM's PMI support and therefore cannot
> > execute. There are several options for building PMI support under
> > SLURM, depending upon the SLURM version you are using:
> >
> > version 16.05 or later: you can use SLURM's PMIx support. This
> > requires that you configure and build SLURM --with-pmix.
> >
> > Versions earlier than 16.05: you must use either SLURM's PMI-1 or
> > PMI-2 support. SLURM builds PMI-1 by default, or you can manually
> > install PMI-2. You must then build Open MPI using --with-pmi
> > pointing
> > to the SLURM PMI library location.
> >
> > Please configure as appropriate and try again.
> >
>  --
> > *** An error occurred in MPI_Init
> > *** on a NULL communicator
> > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now
> abort,
> > ***and potentially your MPI job)
> > [node1:21437] Local abort before MPI_INIT completed completed
> > successfully, but am not able to aggregate error messages, and not
> able
> > to guarantee that all other processes were killed!
> >
> > One simple question:
> > Is this related to https://github.com/open-mpi/ompi/issues/12471
> > ?
> > If so: is there some workaround?
> >
> > I'm very grateful for any comments. I know that a lot of detail
> > information is missing, but maybe someone can still already give me a
> > hint where to look.
> >
> > Thanks a lot
> > Matthias
> >
> >
> > --
> > slurm-users mailing list -- slurm-users@lists.schedmd.co

[slurm-users] Re: X11 performance terrible using plugin

2025-06-06 Thread Davide DelVento via slurm-users
I third the suggestion to use OnDemand. FWIW, OnDemand uses VNC under the
hood, so performance is identical to that, and the user experience is much,
much better. Plain VNC is marginally easier for the administrator to set
up: choose if you prefer doing a bit more administration work or (a little
or a lot, depending on sophistication of your base) more user-support work.

To answer the original question, which most people have avoided The
problem is that X11 is a protocol with a high number of latency-sensitive
messages being exchanged, even for a simple action such as a single button
click (let alone a menu). It was never really designed to run across
complex networks as we do today. Every time you add a hop (a slurm one in
which case) that latency increases and given the large number of messages
involved, it easily becomes noticeable. Here slurm could be the last straw,
but it's possible that the vast majority of latency is introduced by other
network hops.

In our network setup, even just regular ssh with X-tunneling to the head
node results in an unusable latency for X applications running on the head
node itself (let alone on the compute nodes). OnDemand (web server on the
head node, used also as a jump to the compute nodes since they are
inaccessible from the outside) works just fine despite the additional
network hoops.

If you are *really* stuck with plain X-tunneling, there isn't much you can
do, other than a careful study on all the sources of latency and a careful,
and tedious work attempting to limit them. Maybe some tracerouting/pinging
can at least give you a first rough idea on what this would entail. You may
be lucky and there is a single large source which you could easily
mitigate, but in my experience it's always been a death by papercuts
scenario.

On Fri, Jun 6, 2025 at 7:09 AM Burian, John via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> We’ve been using TurboVNC.
>
>
>
> *From: *Hadrian Djohari via slurm-users 
> *Date: *Friday, June 6, 2025 at 8:41 AM
> *To: *John Hearns 
> *Cc: *Simon Andrews ,
> slurm-users@lists.schedmd.com 
> *Subject: *[slurm-users] Re: X11 performance terrible using plugin
>
> Or use Open OnDemand platform for the interactive Desktop. https:
> //openondemand. org/ On Fri, Jun 6, 2025 at 8: 37 AM John Hearns via
> slurm-users  wrote: Simon, I have had
> success in the past by using NICE
>
> ZjQcmQRYFpfptBannerStart
>
> *This Message Is From an External Sender *
>
> This message came from outside your organization.
>
> Search “email warning banner” on ANCHOR for more information
>
>
> 
>
> Report Suspicious
> 
>
>
>
>
>
> ZjQcmQRYFpfptBannerEnd
>
> Or use Open OnDemand platform for the interactive Desktop.
>
> https://openondemand.org/
> 
>
>
>
> On Fri, Jun 6, 2025 at 8:37 AM John Hearns via slurm-users <
> slurm-users@lists.schedmd.com> wrote:
>
> Simon, I have had success in the past by using NICE DCV (now owned by AWS
> but you can get licenses for on prem)
> https://www.ni-sp.com/products/nice-dcv
> 
>
> An alternative would be VirtualGL
>
> Altair Access (though more likely to work with PBS!)
> https://altair.com/access\
> 
>
>
>
>
>
>
>
> On Fri, 6 Jun 2025 at 10:42, Simon Andrews via slurm-users <
> slurm-users@lists.schedmd.com> wrote:
>
> On our cluster we’ve noticed that if we use the native x11 slurm plugin
> (PrologFlags=x11) then X applications work, but are really slow and
> unresponsive.  Even opening menus on graphical application is painfully
> slow.
>
>
>
> On the same system if I do a direct ssh connection with ssh -YC from the
> head node the same applications are quick and responsive.
>
>
>
> Any suggestions for what might be causing this, and how I can get the
> native x11 to have the same responsiveness as a direct ssh connection?
>
>
>
> Many thanks
>
>
>
> Simon.
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe se

[slurm-users] Re: X11 performance terrible using plugin

2025-06-06 Thread Davide DelVento via slurm-users
> The issue isn’t network bandwidth

Latency. The issue with X is always latency, not bandwidth.

On Fri, Jun 6, 2025 at 8:57 AM Simon Andrews via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Thanks for the suggestions – some interesting links to read.  We already
> have an option to run graphical sessions from the head node using Apache
> Guacamole which works well, but that still requires X11 to talk to the
> compute nodes.  We don’t have a full desktop stack on the compute nodes and
> just want to run individual applications.
>
>
>
> The issue isn’t network bandwidth – I can launch two graphical instances,
> one via ssh and the other via srun at the same time on the same compute
> node and ssh is great but srun is terrible.  We do route through the head
> node (the compute nodes aren’t directly addressable) but the overall
> traffic on the head node is pretty modest.
>
>
>
> I’m not really sure how the X11 plugin works (assuming it’s not just doing
> ssh X tunnelling) to try to think what else could be limiting here.
>
>
>
> Simon.
>
>
>
> *From:* Jason Simms 
> *Sent:* 06 June 2025 13:52
> *To:* Hadrian Djohari 
> *Cc:* John Hearns ; Simon Andrews <
> simon.andr...@babraham.ac.uk>; slurm-users@lists.schedmd.com
> *Subject:* Re: [slurm-users] Re: X11 performance terrible using plugin
>
>
>
>
>
> *CAUTION:* This email originated outside of the Organisation. Please help
> to keep us safe and do not click links or open attachments unless you
> recognise the sender and know the content is safe.
>
>
>
> It may or may not be an appropriate solution for your use cases, but I
> second using Open OnDemand and its virtual desktop. It is FAR more
> performant than X11 through Slurm/SSH.
>
> *Jason L. Simms, Ph.D., M.P.H.*
>
> Research Computing Manager
>
> Swarthmore College
> Information Technology Services
>
> (610) 328-8102
>
>
>
>
>
> On Fri, Jun 6, 2025 at 8:40 AM Hadrian Djohari via slurm-users <
> slurm-users@lists.schedmd.com> wrote:
>
> Or use Open OnDemand platform for the interactive Desktop.
>
> https://openondemand.org/
> 
>
>
>
> On Fri, Jun 6, 2025 at 8:37 AM John Hearns via slurm-users <
> slurm-users@lists.schedmd.com> wrote:
>
> Simon, I have had success in the past by using NICE DCV (now owned by AWS
> but you can get licenses for on prem)
> https://www.ni-sp.com/products/nice-dcv
> 
>
> An alternative would be VirtualGL
>
> Altair Access (though more likely to work with PBS!)
> https://altair.com/access\
> 
>
>
>
>
>
>
>
> On Fri, 6 Jun 2025 at 10:42, Simon Andrews via slurm-users <
> slurm-users@lists.schedmd.com> wrote:
>
> On our cluster we’ve noticed that if we use the native x11 slurm plugin
> (PrologFlags=x11) then X applications work, but are really slow and
> unresponsive.  Even opening menus on graphical application is painfully
> slow.
>
>
>
> On the same system if I do a direct ssh connection with ssh -YC from the
> head node the same applications are quick and responsive.
>
>
>
> Any suggestions for what might be causing this, and how I can get the
> native x11 to have the same responsiveness as a direct ssh connection?
>
>
>
> Many thanks
>
>
>
> Simon.
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>
>
>
> --
>
> Hadrian Djohari
> Director of Advanced Research Computing, [U]Tech
> Case Western Reserve University
> (W): 216-368-0395
> (M): 216-798-7490
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>
> --
>
> This email has been scanned for spam & viruses. If you believe this email
> should have been stopped by our filters, click here
> 
> to report it.
>
> --
> slurm-users mailing list -- slurm-

[slurm-users] Re: Implementing a "soft" wall clock limit

2025-06-11 Thread Davide DelVento via slurm-users
Thanks Loris,

Am I correct if reading in between the lines you're saying: rather than
going on with my "soft" limit idea, just use the regular hard limits, being
generous with the default and providing user education instead? In fact
that is an alternative approach that I am considering too.

On Wed, Jun 11, 2025 at 6:15 AM Loris Bennett via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hi Davide,
>
> Davide DelVento via slurm-users
>  writes:
>
> > In the institution where I work, so far we have managed to live
> > without mandatory wallclock limits (a policy decided well before I
> > joined the organization), and that has been possible because the
> > cluster was not very much utilized.
> >
> > Now that is changing, with more jobs being submitted and those being
> > larger ones. As such I would like to introduce wallclock limits to
> > allow slurm to be more efficient in scheduling jobs, including with
> > backfill.
> >
> > My concern is that this user base is not used to it and therefore I
> > want to make it easier for them, and avoid common complaints. I
> > anticipate one of them would be "my job was cancelled even though
> > there were enough nodes idle and no other job in line after mine"
> > (since the cluster utilization is increasing, but not yet always full
> > like it has been at most other places I know).
> >
> > So my question is: is it possible to implement "soft" wallclock limits
> > in slurm, namely ones which would not be enforced unless necessary to
> > run more jobs? In other words, is it possible to change the
> > pre-emptability of a job only after some time has passed? I can think
> > of some ways to hack this functionality myself with some cron or at
> > jobs, and that might be easy enough to do, but I am not sure I can
> > make it robust enough to cover all situations, so I'm looking for
> > something either slurm-native or (if external solution) field-tested
> > by someone else already, so that at least the worst kinks have been
> > already ironed out.
> >
> > Thanks in advance for any suggestions you may provide!
>
> We just have a default wallclock limit of 14 days, but we also have QOS
> with shorter wallclock limits but with higher priorities, albeit with
> for fewer jobs and resources:
>
> $ sqos
>   Name   Priority MaxWall MaxJobs MaxSubmitMaxTRESPU
> -- -- --- --- - 
> hiprio 1003:00:00  50   100   cpu=128,gres/gpu=4
>   prio   1000  3-00:00:00 500  1000   cpu=256,gres/gpu=8
>   standard  0 14-00:00:002000 1  cpu=768,gres/gpu=16
>
> We also have a page of documentation which explains how users can profit
> from backfill.  Thus users have a certain incentive to specify a shorter
> wallclock limit, if they can.
>
> 'sqos' is just an alias for
>
>   sacctmgr show qos
> format=name,priority,maxwall,maxjobs,maxsubmitjobs,maxtrespu%20
>
> Cheers,
>
> Loris
>
> --
> Dr. Loris Bennett (Herr/Mr)
> FUB-IT, Freie Universität Berlin
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Implementing a "soft" wall clock limit

2025-06-12 Thread Davide DelVento via slurm-users
Sounds good, thanks for confirming it.
Let me sleep on it wrt the "too many" QOS, or think if I should ditch this
idea.
If I'll implement it, I'll post in this conversation details on how I did
it.
Cheers

On Thu, Jun 12, 2025 at 6:59 AM Ansgar Esztermann-Kirchner <
aesz...@mpinat.mpg.de> wrote:

> On Thu, Jun 12, 2025 at 04:52:24AM -0600, Davide DelVento wrote:
> > Hi Ansgar,
> >
> > This is indeed what I was looking for: I was not aware of
> PreemptExemptTime.
> >
> > From my cursory glance at the documentation, it seems
> > that PreemptExemptTime is QOS-based and not job based though. Is that
> > correct? Or could it be set per-job, perhaps on a prolog/submit lua
> script?
>
> Yes, that's correct.
> I guess you could create a bunch of QOS with different
> PremptExemptTimes and then let the user select one (or indeed select
> it from lua) but as far as I know, there is no way to set arbitrary
> per-job values.
>
> Best,
>
> A.
> --
> Ansgar Esztermann
> Sysadmin Dep. Theoretical and Computational Biophysics
> https://www.mpinat.mpg.de/person/11315/3883774
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Implementing a "soft" wall clock limit

2025-06-11 Thread Davide DelVento via slurm-users
In the institution where I work, so far we have managed to live without
mandatory wallclock limits (a policy decided well before I joined the
organization), and that has been possible because the cluster was not very
much utilized.

Now that is changing, with more jobs being submitted and those being larger
ones. As such I would like to introduce wallclock limits to allow slurm to
be more efficient in scheduling jobs, including with backfill.

My concern is that this user base is not used to it and therefore I want to
make it easier for them, and avoid common complaints. I anticipate one of
them would be "my job was cancelled even though there were enough nodes
idle and no other job in line after mine" (since the cluster utilization is
increasing, but not yet always full like it has been at most other places I
know).

So my question is: is it possible to implement "soft" wallclock limits in
slurm, namely ones which would not be enforced unless necessary to run more
jobs? In other words, is it possible to change the pre-emptability of a job
only after some time has passed? I can think of some ways to hack this
functionality myself with some cron or at jobs, and that might be easy
enough to do, but I am not sure I can make it robust enough to cover all
situations, so I'm looking for something either slurm-native or (if
external solution) field-tested by someone else already, so that at least
the worst kinks have been already ironed out.

Thanks in advance for any suggestions you may provide!

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Implementing a "soft" wall clock limit

2025-06-12 Thread Davide DelVento via slurm-users
Hi Ansgar,

This is indeed what I was looking for: I was not aware of PreemptExemptTime.

>From my cursory glance at the documentation, it seems
that PreemptExemptTime is QOS-based and not job based though. Is that
correct? Or could it be set per-job, perhaps on a prolog/submit lua script?
I'm thinking that the user could use the regular wallclock limit setting in
slurm and the script could remove that and use it to set
the PreemptExemptTime.

Thanks,
Davide

On Thu, Jun 12, 2025 at 3:56 AM Ansgar Esztermann-Kirchner via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hi Davide,
>
> I think it should be possible to emulate this via preemption: if you
> set PreemptMode to CANCEL, a preempted job will behave just as if it
> reached the end of its wall time. Then, you can use PreemptExemptTime
> as your soft wall time limit -- the job will not be preempted before
> PreemptExemptTime has passed.
>
> See https://slurm.schedmd.com/preempt.html
>
>
> Best,
>
> A.
>
> --
> Ansgar Esztermann
> Sysadmin Dep. Theoretical and Computational Biophysics
> https://www.mpinat.mpg.de/person/11315/3883774
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Problem querying slurm batch script archive

2025-06-09 Thread Davide DelVento via slurm-users
I configured slurm.conf on our machine with

AccountingStoreFlags=job_script,job_comment

So that people can run

sacct -Bj 

and check what the batch script was for the specified job_ID.

This works fine for root and works great for many users (who can see their
own job scripts, but not those of other users, as it ought to be).

Unfortunately, for several users (but not even close to half of the users
of the machine), the sacct -Bj command returns nothing, like if they
queried someone else's job, even when querying their own.

Does anybody have any clue on what might be wrong and triggering this odd
behavior?
Thanks!

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Job information if job is completed

2025-06-17 Thread Davide DelVento via slurm-users
Yeah, that's an annoying thing which I have not understood why has been
designed that way. The information is there and it can be queried, just
with a different command and spitting it out in a different format. The
syntax is

sacct -j  XXX

which gives you only some fields, or

sacct -o fields,you,want -j XXX

the (super long) list of possible fields (case insensitive) can be queried
with

sacct -e

HTH

On Tue, Jun 17, 2025 at 4:45 AM Gestió Servidors via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hello,
>
>
>
> Is there any way to get all information (like submit script or submit
> node) from a job that is completed? Something like “scontrol show
> jobid=XXX” when job is “running” or “pending”. I need to inspect the submit
> script of a job but I only know job_id.
>
>
>
> Thanks.
>
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Implementing a "soft" wall clock limit

2025-06-17 Thread Davide DelVento via slurm-users
; 1. It reminds them that it's important to be conscious of timelimits
> when submitting jobs
>
> This is a good point.  We use 'jobstats', which provides information
> after a job has completed, about run time relative to time limit,
> amongst other things, although unfortunately many people don't seem to
> read this.  However, even if you do force people to set a time limit,
> they can still choose not to think about it and just set the maximum.
>
> > 2. If a job is killed before it's done and all the progress is lost
> because the job wasn't checkpointing, they can't blame you as the admin.
>
> I don't really understand this point.  The limit is just the way it is,
> just as we have caps on the total number of cores or GPUs the jobs given
> user can use at any one time.  Up to now no-one has blamed us for this.
>
> > If you do this, it's easy to get the users on board by first providing
> useful and usable documentation on why timelimits are needed and how to set
> them. Be
> > sure to hammer home the point that effective timelimits can lead to
> their jobs running sooner, and that effective timelimits can increase
> cluster
> > efficiency/utilization, helping them get a better return on their
> investment (if they contribute to the clusters cost) or they'll get more
> science done. I like to
> > frame it that accurate wallclock times will give them a competitive edge
> in getting their jobs running before other cluster users. Everyone likes to
> think what
> > they're doing will give them an advantage!
>
> I agree with all this and this is also what we also try to do.  The only
> thing I don't concur with is your last sentence.  In my experience, as
> long as things work, users will in general not give a fig about whether
> they are using resources efficiently.  Only when people notice a delay
> in jobs starting do they become more aware about it and are prepared to
> take action.  It is particularly a problem with new users, because
> fairshare means that their jobs will start pretty quickly, no matter how
> inefficiently they have configured them.  Maybe we should just give new
> users fewer share initially and only later bump them up to some standard
> value.
>
> Cheers,
>
> Loris
>
> > My 4 cents (adjusted for inflation).
> >
> > Prentice
> >
> > On 6/12/25 9:11 PM, Davide DelVento via slurm-users wrote:
> >
> >  Sounds good, thanks for confirming it.
> >  Let me sleep on it wrt the "too many" QOS, or think if I should ditch
> this idea.
> >  If I'll implement it, I'll post in this conversation details on how I
> did it.
> >  Cheers
> >
> >  On Thu, Jun 12, 2025 at 6:59 AM Ansgar Esztermann-Kirchner <
> aesz...@mpinat.mpg.de> wrote:
> >
> >  On Thu, Jun 12, 2025 at 04:52:24AM -0600, Davide DelVento wrote:
> >  > Hi Ansgar,
> >  >
> >  > This is indeed what I was looking for: I was not aware of
> PreemptExemptTime.
> >  >
> >  > From my cursory glance at the documentation, it seems
> >  > that PreemptExemptTime is QOS-based and not job based though. Is that
> >  > correct? Or could it be set per-job, perhaps on a prolog/submit lua
> script?
> >
> >  Yes, that's correct.
> >  I guess you could create a bunch of QOS with different
> >  PremptExemptTimes and then let the user select one (or indeed select
> >  it from lua) but as far as I know, there is no way to set arbitrary
> >  per-job values.
> >
> >  Best,
> >
> >  A.
> >  --
> >  Ansgar Esztermann
> >  Sysadmin Dep. Theoretical and Computational Biophysics
> >  https://www.mpinat.mpg.de/person/11315/3883774
> >
> >
> > --
> > Prentice Bisbal
> > HPC Systems Engineer III
> > Computational & Information Systems Laboratory (CISL)
> > NSF National Center for Atmospheric Research (NSF NCAR)
> > https://www.cisl.ucar.edu
> > https://ncar.ucar.edu
> --
> Dr. Loris Bennett (Herr/Mr)
> FUB-IT, Freie Universität Berlin
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com