Re: [slurm-users] Job Step Output Delay

2021-02-10 Thread Aaron Jackson
Is it being written to NFS? You say on your local dev cluster it's a single node. Is it also the login node as well as compute? In that case I guess there is no NFS. Larger cluster will be using some sort of shared storage, so whichever shared file system you are using likely has caching. If you a

Re: [slurm-users] Job flexibility with cons_tres

2021-02-10 Thread Aaron Jackson
Similar problem in the cluster I look after. I have a job_submit script which adds certain nodes to the job's excluded nodes list based on each node's number of cpus per gpus. This basically solved problem with fragmentation entirely. The problem is that cons_tres seems to think (for example) tha

Re: [slurm-users] Reserve some cores per GPU

2020-10-20 Thread Aaron Jackson
I look after a very heterogeneous GPU Slurm setup and some nodes have quite few cores. We use a job_submit lua script which calculates the number of requested cpu cores per gpu. This is then used to scan through a table of 'weak nodes' based on a 'max cores per gpu' property. The node names are app

Re: [slurm-users] Filter slurm e-mail notification

2019-11-27 Thread Aaron Jackson
>> Hi, >> >> I guess you could use a lua script to filter out flags you don't >> want. I haven't tried it with mail flags, but I'm using a script like >> the one referenced to enforce accounts/time limits, etc. >> >> https://funinit.wordpress.com/2018/06/07/how-to-use-job_submit_lua-with-slurm/ >>

Re: [slurm-users] Filter slurm e-mail notification

2019-11-27 Thread Aaron Jackson
> Hi, > > I guess you could use a lua script to filter out flags you don't > want. I haven't tried it with mail flags, but I'm using a script like > the one referenced to enforce accounts/time limits, etc. > > https://funinit.wordpress.com/2018/06/07/how-to-use-job_submit_lua-with-slurm/ > > Cheers

Re: [slurm-users] Get GPU usage from sacct?

2019-11-16 Thread Aaron Jackson
Janne Blomqvist writes: > On 14/11/2019 20.41, Prentice Bisbal wrote: >> Is there any way to see how much a job used the GPU(s) on a cluster >> using sacct or any other slurm command? >> > > We have created > https://github.com/AaltoScienceIT/ansible-role-sacct_gpu/ as a quick > hack to put GPU uti

Re: [slurm-users] gres:gpu managment

2019-05-23 Thread Aaron Jackson
; Name=gpu Type=v100 File=/dev/nvidia1 CPUs=0-17,36-53 > Name=gpu Type=v100 File=/dev/nvidia2 CPUs=18-35,54-71 > Name=gpu Type=v100 File=/dev/nvidia3 CPUs=18-35,54-71 > > Any help appreciated. > > Thanks, Daniel Vecerka CTU Prague Do jobs actually end up on the same GPU though?

Re: [slurm-users] Kinda Off-Topic: data management for Slurm clusters

2019-02-22 Thread Aaron Jackson
the funding for 25-100G > networks and/or all-flash commercial data storage appliances (NetApp, Pure, > etc.) > > Any good patterns that I might be able to learn about implementing here? We > have a few ideas floating about, but I figured this already may be a solved > problem in this community... > > Thanks! > Will -- Aaron Jackson - M6PIU http://aaronsplace.co.uk/

Re: [slurm-users] Disabling --nodelist

2018-11-27 Thread Aaron Jackson
t, and command line options will override any environment variables“ > If the call —nodelist in sbatch-script this may solve the problem. > > Beyond all that I would just contact those users and tell them not to use > nodelist. > > Andreas >> Am 27.11.2018 um 18:05 schrieb Aa

[slurm-users] Disabling --nodelist

2018-11-27 Thread Aaron Jackson
Hi all, I am wondering if it is possible to disable the use of the --nodelist argument from srun/sbatch/salloc/etc? In the worst case I can just edit the code for argument parsing? Having only recently moved over to Slurm, some users have a preference for particular nodes with no justifiable reas