Re: [slurm-users] Drain a single user's jobs

2020-04-01 Thread David Rhey
sting jobs aren't running. I'm seeing the reason code > "InvalidQOS". > > Any ideas what I should be looking at, please? > > Thanks, > > Mark > > -- David Rhey --- Advanced Research Computing - Technology Services University of Michigan

Re: [slurm-users] Job with srun is still RUNNING after node reboot

2020-03-31 Thread David Rhey
ncountered this? or know how to make the job state not > RUNNING after it's clearly not running? > > Thanks in advance, > Yair. > > -- David Rhey --- Advanced Research Computing - Technology Services University of Michigan

Re: [slurm-users] Slurm very rarely assigned an estimated start time to a job

2019-10-03 Thread David Rhey
either. > > I also would welcome discussion of how to tune the backfill scheduler! > I suspect that in order to work well, it needs a particular distribution > of job priorities. > > regards, mark hahn. > > -- David Rhey --- Advanced Research Computing - Technology Services University of Michigan

Re: [slurm-users] Does Slurm store "time in current state" values anywhere ?

2019-10-03 Thread David Rhey
essed on the > "user-side" ? > > > What we're trying to avoid is the need to write a not-quite-Slurm > database that stores such info by continually polling our actual > Slurm database, because we don't think of ourselves as meta-scheduler > writers. > > H

Re: [slurm-users] Maxjobs not being enforced

2019-09-18 Thread David Rhey
ing is still not being enforced and the user is able > >> to > >> launch 1000s of jobs. > >> > >> I also ran 'scontrol reconfig' and even restarted slurmd on the computes > >> but no luck. I'm on 17.11. Are there additional steps

Re: [slurm-users] Maxjobs not being enforced

2019-09-17 Thread David Rhey
red > storage. This setting is still not being enforced and the user is able to > launch 1000s of jobs. > > I also ran 'scontrol reconfig' and even restarted slurmd on the computes > but no luck. I'm on 17.11. Are there additional steps to limit a user? > > Be

[slurm-users] oddity with users showing in sacctmgr and sreport

2019-09-12 Thread David Rhey
they aren't a part of the root hierarchy in sacctmgr. We're using 18.08.7. Thanks! -- David Rhey --- Advanced Research Computing - Technology Services University of Michigan

Re: [slurm-users] No error/output/run

2019-07-24 Thread David Rhey
ob 1277 > $ squeue > JOBID PARTITION NAME USER ST TIME NODES > NODELIST(REASON) > $ ls > in.lj slurm_script.sh > $ > > > What does that mean? > > Regards, > Mahmood > > > -- David Rhey --- Advanced Research Computing - Technology Services University of Michigan

Re: [slurm-users] Cluster-wide GPU Per User limit

2019-07-17 Thread David Rhey
r modify cluster slurm_cluster set MaxTRESPerUser=gres/gpu=2 > Unknown option: MaxTRESPerUser=gres/gpu=2 > Use keyword 'where' to modify condition > > > Thanks! > > -- > Mike Harvey > Systems Administrator > Engineering Computing > Bucknell Univ

Re: [slurm-users] Invalid qos specification

2019-07-15 Thread David Rhey
n error: > > $ salloc -p general -q debug -t 00:30:00 > salloc: error: Job submit/allocate failed: Invalid qos specification > > I'm sure I'm overlooking something obvious. Any idea what that may be? > I'm using slurm 18.08.8 on the slurm controller, and the clients

[slurm-users] Question on billing tres information from sacct, sshare, and scontrol

2019-02-21 Thread David Rhey
le of theories, and have been looking through source code to try and understand a bit better. For context, I am trying to understand what a job costs, and what usage for an account over a span of say a month costs. Any insight is most appreciated! -- David Rhey --- Advanced Res

Re: [slurm-users] How to request ONLY one CPU instead of one socket or one node?

2019-02-15 Thread David Rhey
===== > > Where wcnqn.auto.pl is my program. 9625 denotes the species number. > > > -- David Rhey --- Advanced Research Computing - Technology Services University of Michigan

Re: [slurm-users] External provisioning for accounts and other things (?)

2018-09-19 Thread David Rhey
Thanks! I'll check this out. Ya'll are awesome for the responses. On Wed, Sep 19, 2018 at 7:57 AM Chris Samuel wrote: > On Wednesday, 19 September 2018 5:00:58 AM AEST David Rhey wrote: > > > First time caller, long-time listener. Does anyone use any sort of > exter

Re: [slurm-users] External provisioning for accounts and other things (?)

2018-09-18 Thread David Rhey
couple of the underlying libraries (Perl wrappers around sacctmgr and > sshare commands) are available on CPAN (Slurm::Sacctmgr, Slurm::Sshare); > the rest lack the polish and finish required for publishing on CPAN. > > On Tue, Sep 18, 2018 at 3:02 PM David Rhey wrote: > >>

[slurm-users] External provisioning for accounts and other things (?)

2018-09-18 Thread David Rhey
d be extra interested in how you achieved that. Thanks! -- David Rhey --- Advanced Research Computing - Technology Services University of Michigan