[slurm-users] Re: Plese help [CPUs=24 Boards=1 SocketsPerBoard=1 CoresPerSocket=16 ThreadsPerCore=1]

2025-02-16 Thread Diego Zuccato via slurm-users
Hi Hugo. i[3-9] have 2 kinds of cores: the more performant ones with hyperthreading and the slower ones without. From https://www.intel.com/content/www/us/en/products/docs/processors/core/core-14th-gen-desktop-brief.html : -8<-- These processors feature performance hybrid architecture1, com

[slurm-users] Re: Job not starting

2024-12-10 Thread Diego Zuccato via slurm-users
=PENDING Reason=ReqNodeNotAvail,_UnavailableNodes:gpu03 Dependency=(null) Paint me surprised... Diego Il 07/12/2024 10:03, Diego Zuccato via slurm-users ha scritto: Ciao Davide. Il 06/12/2024 16:42, Davide DelVento ha scritto: I find it extremely hard to understand situations like this. I wish

[slurm-users] Re: Job not starting

2024-12-07 Thread Diego Zuccato via slurm-users
#x27;long' (10, IIRC). Diego On Fri, Dec 6, 2024 at 7:36 AM Diego Zuccato via slurm-users us...@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com>> wrote: Hello all. An user reported that a job wasn't starting, so I tried to replicate the request and

[slurm-users] Job not starting

2024-12-06 Thread Diego Zuccato via slurm-users
Hello all. An user reported that a job wasn't starting, so I tried to replicate the request and I get: -8<-- [root@ophfe1 root.old]# scontrol show job 113936 JobId=113936 JobName=test.sh UserId=root(0) GroupId=root(0) MCS_label=N/A Priority=1 Nice=0 Account=root QOS=long JobState=PENDIN

[slurm-users] Re: Suspending jobs and resuming

2024-11-21 Thread Diego Zuccato via slurm-users
IIUC, when you suspend a job it remains in memory but with no CPU time allocated. If you reboot the node, the job state is lost (unless it uses checkpointing). When you restarted the jobs, they actually began a new run (Slurm doesn't know if they use checkpointing or not). You've been lucky tha

[slurm-users] Re: Does Slurm support DSP

2024-11-20 Thread Diego Zuccato via slurm-users
And, if it's a device (like a PCIe board), can it be shared between processes or not? If it's shareable (like a network interface) you can configure it as a feature. If it's not you have to make it a tres (and possibly configure cgroups to deny access from jobs that did not request it). Diego

[slurm-users] Re: Change primary alloc node

2024-10-31 Thread Diego Zuccato via slurm-users
Seems the perfect use case for heterogeneous jobs... Diego Il 31/10/2024 14:18, Davide DelVento via slurm-users ha scritto: Another possible use case of this is a regular MPI job where the first/ controller task often uses more memory than the workers and may need to be scheduled on a higher m

[slurm-users] Re: Nodes TRES double what is requested

2024-07-10 Thread Diego Zuccato via slurm-users
Hint: round down a bit the RAM reported by 'slurmd -C'. Or you risk the nodes not coming back up after an upgrade that leaves a bit less free RAM than configured. Diego Il 10/07/2024 17:29, Brian Andrus via slurm-users ha scritto: Jack, To make sure things are set right, run 'slurmd -C' on t

[slurm-users] Re: Executing srun -n X where X is greater than total CPU in entire cluster

2024-05-30 Thread Diego Zuccato via slurm-users
IIUC you can't do that. You either allow overcommit or you split your job in multiple, smaller jobs that fit. The resources you're requesting must be available at the same time: if your job needs 2 CPUs and you want to run it in parallel, just use a job array. If you request 500 CPUs it mean

[slurm-users] Re: [EXTERN] Re: scheduling according time requirements

2024-04-30 Thread Diego Zuccato via slurm-users
Try adding to the config: EnforcePartLimits=ANY JobSubmitPlugins=all_partitions Diego Il 30/04/2024 15:11, Dietmar Rieder via slurm-users ha scritto: Hi Loris, On 4/30/24 2:53 PM, Loris Bennett via slurm-users wrote: Hi Dietmar, Dietmar Rieder via slurm-users writes: Hi, is it possible t

[slurm-users] Re: Lua script

2024-03-06 Thread Diego Zuccato via slurm-users
Il 06/03/2024 13:49, Gestió Servidors via slurm-users ha scritto: And how can I reject the job inside the lua script? Just use return slurm.FAILURE and job will be refused. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna

[slurm-users] Re: Lua script

2024-03-06 Thread Diego Zuccato via slurm-users
I don't know why that happens (other than you're opening a comment and not closing it, IIUC), but it would probably be less surprising to just reject the submission than reduce the limit. In the (rare...) case the user actually needs all the time requested, you risk wasting resources. If you rej