[slurm-users] Re: slurm releases

2025-04-05 Thread Ryan Novosielski via slurm-users
-- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research

[slurm-users] Re: Cloud elastic help

2025-01-29 Thread Ryan Novosielski via slurm-users
hen executed). -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Com

[slurm-users] Re: sinfo not listing any partitions

2024-11-27 Thread Ryan Novosielski via slurm-users
At this point, I’d probably crank up the logging some and see what it’s saying in slurmctld.log. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr

[slurm-users] Re: sinfo not listing any partitions

2024-11-27 Thread Ryan Novosielski via slurm-users
*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB A555B, Newark `' On Nov 27, 2024, at 09:56, Kent L. Hanson via slurm-users

[slurm-users] Re: A note on updating Slurm from 23.02 to 24.05 & multi-cluster

2024-09-26 Thread Ryan Novosielski via slurm-users
___ || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB A555B, Newark `' --

[slurm-users] Re: SlurmDBD errors

2024-09-18 Thread Ryan Novosielski via slurm-users
|| \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB A555B, Newark

[slurm-users] Re: Unsupported RPC version by slurmctld 19.05.3 from client slurmd 22.05.11

2024-06-17 Thread Ryan Novosielski via slurm-users
The benefits are pretty limited if you don’t have the server upgraded anyway, unless you’re just saying it’s easier to install a current client. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski

[slurm-users] Re: diagnosing why interactive/non-interactive job waits are so long with State=MIXED

2024-06-04 Thread Ryan Novosielski via slurm-users
quick response Ryan! Are there any recommendations for bf_ options from https://slurm.schedmd.com/sched_config.html that could help with this? bf_continue? Decreasing bf_interval= to a value lower than 30? On Tue, Jun 4, 2024 at 4:13 PM Ryan Novosielski mailto:novos...@rutgers.edu>>

[slurm-users] Re: diagnosing why interactive/non-interactive job waits are so long with State=MIXED

2024-06-04 Thread Ryan Novosielski via slurm-users
This is relatively true of my system as well, and I believe it’s that the backfill schedule is slower than the main scheduler. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread Ryan Novosielski via slurm-users
un. I suspect it is doing something. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Offi

[slurm-users] Re: Jobs showing running but not running

2024-05-29 Thread Ryan Novosielski via slurm-users
One of the other states — down or fail, from memory — should cause it to completely drop the job. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr

[slurm-users] Re: Removing safely a node

2024-05-16 Thread Ryan Novosielski via slurm-users
, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB A555B, Newark `' On May 16, 2024, at

[slurm-users] Re: Recover Batch Script Error

2024-02-16 Thread Ryan Novosielski via slurm-users
| Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB A555B, Newark `' On Feb 16, 2024, at 14:41, Jason Simms via slurm-users wrote: Hello all, I'v

Re: [slurm-users] error: Couldn't find the specified plugin name for cred/munge looking at all files

2024-01-23 Thread Ryan Novosielski
Ah, I see — no, it’s 24.08. That’s why I didn’t find any reference to it. Carry on! :-D -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist

Re: [slurm-users] error: Couldn't find the specified plugin name for cred/munge looking at all files

2024-01-23 Thread Ryan Novosielski
|| \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB A555B, Newark `'

Re: [slurm-users] sacct --name --status filtering

2024-01-10 Thread Ryan Novosielski
, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB A555B, Newark `' On J

Re: [slurm-users] SlurmdSpoolDir full

2023-12-10 Thread Ryan Novosielski
This is basically always somebody filling up /tmp and /tmp residing on the same filesystem as the actual SlurmdSpoolDirectory. /tmp, without modifications, it’s almost certainly the wrong place for temporary HPC files. Too large. Sent from my iPhone > On Dec 8, 2023, at 10:02, Xaver Stiensmeie

Re: [slurm-users] Time spent in PENDING/Priority

2023-12-07 Thread Ryan Novosielski
*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB A555B, Newark `' On Dec 7, 2023, at 15:09, Chip Seraphine

Re: [slurm-users] SLURM new user query, does SLURM has GUI /Web based management version also

2023-11-28 Thread Ryan Novosielski
It primarily does other things, but you can interact with Slurm in Open OnDemand. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973

Re: [slurm-users] ReservedCoresPerGPU

2023-11-27 Thread Ryan Novosielski
, 2023, at 5:34 PM, Ryan Novosielski wrote:  Looks like 24.08 to me, so s/introduced/introduces. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr

Re: [slurm-users] ReservedCoresPerGPU

2023-11-27 Thread Ryan Novosielski
Looks like 24.08 to me, so s/introduced/introduces. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus

Re: [slurm-users] slurm comunication between versions

2023-11-24 Thread Ryan Novosielski
What do you mean by management node, slurmctld? Or just a node with the client software on it? -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr

Re: [slurm-users] ulimits

2023-11-16 Thread Ryan Novosielski
The pam_slurm.so<http://pam_slurm.so> module has an impact on these values, if you are using it. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ Universit

Re: [slurm-users] cpus-per-task behaviour of srun after 22.05

2023-10-22 Thread Ryan Novosielski
What we say at our site is that you should use srun, if you don’t use srun, you will see limited, if any, output on resource usage in the various places you can see it (sacct, etc), and I learned recently that sattach won’t work either. I find it’s also easier to make mistakes with resource use

Re: [slurm-users] Slurm versions 23.02.6 and 22.05.10 are now available (CVE-2023-41914)

2023-10-13 Thread Ryan Novosielski
. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB

Re: [slurm-users] A strange situation of different network cards on the same network

2023-10-10 Thread Ryan Novosielski
etc. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Resea

Re: [slurm-users] Verifying preemption WON'T happen

2023-09-29 Thread Ryan Novosielski
You can get some information on that from sdiag, and there are tweaks you can make to backfill scheduling that affect how quickly it will get to a job. That doesn’t really answer your real question, but might help you when you are looking into this. Sent from my iPhone On Sep 29, 2023, at 16:1

Re: [slurm-users] Steps to upgrade slurm for a patchlevel change?

2023-09-29 Thread Ryan Novosielski
outs, it’s pretty uneventful. You won’t have that long database upgrade period, since no database modifications will be required, so it’s pretty much like upgrading anything else. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State

Re: [slurm-users] Steps to upgrade slurm for a patchlevel change?

2023-09-28 Thread Ryan Novosielski
, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB A555B, Newark `' On Sep 28, 2023, at

Re: [slurm-users] enabling job script archival

2023-09-28 Thread Ryan Novosielski
is what I’d ask for. I assume that archiving, in general, would also remove this stuff, since old jobs themselves will be removed? -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos

Re: [slurm-users] enabling job script archival

2023-09-28 Thread Ryan Novosielski
for. I assume that archiving, in general, would also remove this stuff, since old jobs themselves will be removed? -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu

Re: [slurm-users] How to use partition option "Hidden"?

2023-08-24 Thread Ryan Novosielski
te these things. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Ad

Re: [slurm-users] Transport from SLC to Provo?

2023-08-19 Thread Ryan Novosielski
Or an airport hotel for the first night. Done that many times. Sent from my iPhone On Aug 19, 2023, at 13:53, Lloyd Brown wrote:  Something else to consider that I just thought of. If you're arriving late on Sunday, and SLUG doesn't start until Tuesday, you cound just get a hotel in SLC some

Re: [slurm-users] extended list of nodes allocated to a job

2023-08-17 Thread Ryan Novosielski
I didn’t know that one! Thank you. Sent from my iPhone On Aug 17, 2023, at 09:50, Alain O' Miniussi wrote:  Hi Sean, A colleague pointed to me the following commands: #scontrol show hostname x[1000,1009,1029-1031] x1000 x1009 x1029 x1030 x1031 #scontrol show hostlist x[1000,1009,1029,1030,10

Re: [slurm-users] Temporary Stop User Submission

2023-05-25 Thread Ryan Novosielski
I tend not to let them login. It will get their attention, and prevent them from just running their work on the login node when they discover they can’t submit. But appreciate seeing the other options. Sent from my iPhone > On May 25, 2023, at 19:19, Markuske, William wrote: > >  Hello, > >

Re: [slurm-users] [External] Re: Slurm 22.05.8 - salloc not starting shell on remote host

2023-05-19 Thread Ryan Novosielski
a shell for salloc is a newer feature. For your version, you should: srun -n 1 -t 00:10:00 --mem=1G --pty bash Brian Andrus On 5/19/2023 8:24 AM, Ryan Novosielski wrote: I’m not at a computer, and we run an older version of Slurm yet so I can’t say with 100% confidence that his this has

Re: [slurm-users] Slurm 22.05.8 - salloc not starting shell on remote host

2023-05-19 Thread Ryan Novosielski
I’m not at a computer, and we run an older version of Slurm yet so I can’t say with 100% confidence that his this has changed and I can’t be too specific, but I know that this is the behavior you should expect from that command. I believe that there are configuration options to make it behave di

Re: [slurm-users] Migration of slurm communication network / Steps / how to

2023-04-23 Thread Ryan Novosielski
I think it’s easier than all of this. Are you actually changing names of all of these things, or just IP addresses? It they all resolve to an IP now and you can bring everything down and change the hosts files or DNS, it seems to me that if the names aren’t changing, that’s that. I know that “sc

Re: [slurm-users] srun --mem issue

2022-12-08 Thread Ryan Novosielski
/pestat/pestat -- #BlackLivesMatter || \\UTGERS, |-------*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu<mailto:novos...@rutgers.edu> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB C630, Newark `'

Re: [slurm-users] srun --mem issue

2022-12-08 Thread Ryan Novosielski
d the graphs look awesome! > Would you be willing to share the scripts you're using to generate > those reports? That sounds like something many sites could benefit > from! Agreed, same. -- #BlackLivesMatter || \\UTGERS, |---*O*-

Re: [slurm-users] Upgrade from 20.11.0 to Slurm version 22.05.6 ?

2022-11-10 Thread Ryan Novosielski
We basically always do this. Just be mindful of how long it takes to upgrade your database (if you have that ability to do a dry run, you might ant to do that). That’s true of any upgrade, though. If you have to skip more than one version, you’ll have to upgrade in stages. On Nov 10, 2022, at 7

Re: [slurm-users] Using "srun" on compute nodes -- Ray cluster

2022-07-15 Thread Ryan Novosielski
des. Would Slurm enforce limits properly ("qos" or "partition" limits)? Kind Regards -- #BlackLivesMatter || \\UTGERS, |--*O*---- ||_// the State |Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Techno

Re: [slurm-users] Need to restart slurmctld for gres jobs to start

2022-06-24 Thread Ryan Novosielski
topology plugin. We use this to keep jobs from spanning two different infiniband fabrics that are connected together via lower bandwidth than the rest of the fabric. -- #BlackLivesMatter || \\UTGERS, |--*O* ||_// the State |Ryan Novo

Re: [slurm-users] DBD Reset

2022-06-15 Thread Ryan Novosielski
much for pointing me in the correct direction. Thanks, Reed On Jun 15, 2022, at 7:50 PM, Ryan Novosielski mailto:novos...@rutgers.edu>> wrote: Apologies for not having more concrete information available when I’m replying to you, but I figured maybe having a fast hint might be better.

Re: [slurm-users] DBD Reset

2022-06-15 Thread Ryan Novosielski
Apologies for not having more concrete information available when I’m replying to you, but I figured maybe having a fast hint might be better. Have a look at how the various daemons communicate with one another. This sounds to me like a firewall thing between maybe the SlurmCtld and where the S

Re: [slurm-users] sbatch - accept jobs above limits

2022-02-08 Thread Ryan Novosielski
I’m not 100% certain that this affects this situation, but there’s a slurm.conf setting called EnforcePartLimits that you might want to change. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski

Re: [slurm-users] Upgrade from 17.02.11 to 21.08.2 and state information

2022-02-03 Thread Ryan Novosielski
e is lost. You don’t normally see that memory being used like that, because slurmdbd is normally up/accepting the accounting data. -- #BlackLivesMatter || \\UTGERS, |-------*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu ||

Re: [slurm-users] srun : Communication connection failure

2022-01-25 Thread Ryan Novosielski
, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB C630, Newark `' >

Re: [slurm-users] Updated "pestat" tool for printing Slurm nodes status including GRES/GPU

2021-12-14 Thread Ryan Novosielski
related to "squeue -O". May not work with Slurm 19.05 and older. :04 04 dee11077f72dd898dcadccf9d0dd2cfc438a8d1f 61880fe14a49a7a96167b89d21dede41f2751d86 M pestat > On Dec 14, 2021, at 4:29 PM, Ryan Novosielski wrote: > > Hi Ole, > > Thanks again for your great

Re: [slurm-users] How to get an estimate of job completion for planned maintenance?

2021-12-14 Thread Ryan Novosielski
date/time. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced

Re: [slurm-users] Updated "pestat" tool for printing Slurm nodes status including GRES/GPU

2021-12-14 Thread Ryan Novosielski
* 128000 116325 You can see Joblist and JobID User are not present. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922

Re: [slurm-users] [External] Re: PropagateResourceLimits

2021-04-29 Thread Ryan Novosielski
rentice > > > On 4/22/21 10:55 AM, Ryan Novosielski wrote: >> My recollection is that this parameter is talking about “ulimit” parameters, >> and doesn’t have to do with cgroups. The documentation is not as clear here >> as it could be, about what this does, the mec

Re: [slurm-users] PropagateResourceLimits

2021-04-22 Thread Ryan Novosielski
My recollection is that this parameter is talking about “ulimit” parameters, and doesn’t have to do with cgroups. The documentation is not as clear here as it could be, about what this does, the mechanism by which it’s applied (PAM module), etc. Sent from my iPhone > On Apr 22, 2021, at 09:07

Re: [slurm-users] Jobs that may still be running at X time?

2021-04-16 Thread Ryan Novosielski
*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Apr 16, 2021, at 6:21 PM, Juer

[slurm-users] Jobs that may still be running at X time?

2021-04-16 Thread Ryan Novosielski
nning. Anyway, I figure this is something people probably need to know often enough. Any tips? -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist

Re: [slurm-users] Exclude Slurm packages from the EPEL yum repository

2021-02-03 Thread Ryan Novosielski
, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Feb 3, 2021, at

Re: [slurm-users] Exclude Slurm packages from the EPEL yum repository

2021-01-24 Thread Ryan Novosielski
*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu<mailto:novos...@rutgers.edu> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On J

Re: [slurm-users] Compute node process monitoring tools updated

2021-01-19 Thread Ryan Novosielski
Thanks, that’s great! I do a lot of that by hand (including lots over this weekend), so it will be a nice timesaver. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu

Re: [slurm-users] Moving Slurmctld and slurmdbd to a new host

2021-01-15 Thread Ryan Novosielski
|| \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu<mailto:novos...@rutgers.edu> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Re

Re: [slurm-users] [EXT] GPU Jobs with Slurm

2021-01-15 Thread Ryan Novosielski
|| \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu<mailto:novos...@rutgers.edu> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Re

Re: [slurm-users] [EXT] GPU Jobs with Slurm

2021-01-14 Thread Ryan Novosielski
AFAIK, if you have this set up correctly, nvidia-smi will be restricted too, though I think we were seeing a bug there at one time in this version. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan

Re: [slurm-users] [External] Re: can't lengthen my jobs log

2020-12-04 Thread Ryan Novosielski
As root, -a is effectively applied to every command I’m aware of. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922

Re: [slurm-users] How to contact slurm developers

2020-09-30 Thread Ryan Novosielski
I’ve previously seen code contributed back in that way. See bug 1611 as an example (happened to have looked at that just yesterday). -- || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu

Re: [slurm-users] How to contact slurm developers

2020-09-30 Thread Ryan Novosielski
| Ryan Novosielski - novos...@rutgers.edu<mailto:novos...@rutgers.edu> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Sep 30, 2020, at 10:57, Relu Patrascu wrote: Hi all

Re: [slurm-users] EXTERNAL: Re: Memory per CPU

2020-09-30 Thread Ryan Novosielski
Absolutely not. It’s recommended. -- || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu<mailto:novos...@rutgers.edu> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS

Re: [slurm-users] EXTERNAL: Re: Memory per CPU

2020-09-30 Thread Ryan Novosielski
, in the case of mpirun, etc.). -- || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu<mailto:novos...@rutgers.edu> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS

Re: [slurm-users] Slurm -- using GPU cards with NVLINK

2020-09-10 Thread Ryan Novosielski
NodeName=cuda[001-008] Name=gpu File=/dev/nvidia[2-3] CPUs=12-23 This also seems to be related: https://slurm.schedmd.com/SLUG19/GPU_Scheduling_and_Cons_Tres.pdf -- || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos

Re: [slurm-users] is there a way to delay the scheduling.

2020-08-28 Thread Ryan Novosielski
. -- || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB C630, Newark

Re: [slurm-users] GRES Restrictions

2020-08-25 Thread Ryan Novosielski
Sorry about that. “NJT” should have read “but;” apparently my phone decided I was talking about our local transit authority. 😓 On Aug 25, 2020, at 10:30, Ryan Novosielski wrote:  I believe that’s done via a QoS on the partition. Have a look at the docs there, and I think “require” is a good

Re: [slurm-users] GRES Restrictions

2020-08-25 Thread Ryan Novosielski
, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu<mailto:novos...@rutgers.edu> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MS

Re: [slurm-users] Jobs killed by OOM-killer only on certain nodes.

2020-07-02 Thread Ryan Novosielski
*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu<mailto:novos...@rutgers.edu> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Jul 2, 2020, at 09:5

Re: [slurm-users] slurmd -C showing incorrect core count

2020-03-12 Thread Ryan Novosielski
getting it from the VM somehow. -- || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced

Re: [slurm-users] Slurm 19.05 X11-forwarding

2020-02-25 Thread Ryan Novosielski
heir issues. I have used the >>> "export=DISPLAY, HOME" as an additional argument for srun but >>> without any progress. Anyone with similiar problem who can aid >>> or advice me on howto use the X11Forward feature? Any help is >>> much appreciat

Re: [slurm-users] Node can't run simple job when STATUS is up and STATE is idle

2020-01-20 Thread Ryan Novosielski
The node is not getting the status from itself, it’s querying the slurmctld to ask for its status. -- || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973

Re: [slurm-users] Downgraded to slurm 19.05.4 and now slrumctld won't start because of incompatible state

2020-01-20 Thread Ryan Novosielski
Check slurm.conf for StateSaveLocation. https://slurm.schedmd.com/slurm.conf.html -- || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922

Re: [slurm-users] Is that possible to submit jobs to a Slurm cluster right from a developer's PC

2019-12-11 Thread Ryan Novosielski
| Ryan Novosielski - novos...@rutgers.edu<mailto:novos...@rutgers.edu> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Dec 11, 2019, at 22:41, Victor (Weikai)

[slurm-users] Array jobs vs. many jobs

2019-11-22 Thread Ryan Novosielski
#x27;m not sure it makes any difference here) -- || \\UTGERS,|---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office o

Re: [slurm-users] Get GPU usage from sacct?

2019-11-14 Thread Ryan Novosielski
Do you mean akin to what some would consider "CPU efficiency" on a CPU job? "How much... used" is a little vague. From: slurm-users on behalf of Prentice Bisbal Sent: Thursday, November 14, 2019 13:41 To: Slurm User Community List Subject: [slurm-users]

Re: [slurm-users] Slurm node weights

2019-07-25 Thread Ryan Novosielski
IS an interaction of some sort. -- || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu<mailto:novos...@rutgers.edu> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS

Re: [slurm-users] Hide Filesystem From Slurm

2019-07-11 Thread Ryan Novosielski
du> O: 212-746-6305 > F: 212-746-8690 - -- ____ || \\UTGERS, |--*O* ||_// the State |Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus |

Re: [slurm-users] ConstrainRAMSpace=yes and page cache?

2019-06-21 Thread Ryan Novosielski
epool separate to the processes address space. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA -- || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu<mai

Re: [slurm-users] Increasing job priority based on resources requested.

2019-04-18 Thread Ryan Novosielski
want? I’m not so sure. How soon will someone figure out that they might get a higher priority based on requesting some feature they don’t need? -- || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu

Re: [slurm-users] X11 forwarding and VNC?

2019-03-25 Thread Ryan Novosielski
e have any ideas whether this can be made to work and, if > so, how? - -- || \\UTGERS, |----------*O* ||_// the State |Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\of NJ | Office of Ad

Re: [slurm-users] Database Tuning w/SLURM

2019-03-22 Thread Ryan Novosielski
> On Mar 22, 2019, at 4:22 AM, Ole Holm Nielsen > wrote: > > On 3/21/19 6:56 PM, Ryan Novosielski wrote: >>> On Mar 21, 2019, at 12:21 PM, Loris Bennett >>> wrote: >>> >>> Our last cluster only hit around 2.5 million jobs after >>>

[slurm-users] Database Tuning w/SLURM (was: Re: SLURM heterogeneous jobs, a little help needed plz)

2019-03-21 Thread Ryan Novosielski
orting >24 hour database conversion times. -- || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanc

Re: [slurm-users] SLURM heterogeneous jobs, a little help needed plz

2019-03-21 Thread Ryan Novosielski
I’ve never seen a paycheck signed by “Best Practices”. -- || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB C630, Newark `' signature.asc Description: Message signed with OpenPGP

Re: [slurm-users] Topology configuration questions:

2019-01-22 Thread Ryan Novosielski
pology/tree plugin. >> """ >> >> So the Topology plugin does take precedence over the weighting >> algorithm, but it doesn't disable it, AFAIK. And for sites using >> disjoint networks, as we do, this is a sane behavior. >> >> Cheers, > -- || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB C630, Newark `'

Re: [slurm-users] Topology configuration questions:

2019-01-18 Thread Ryan Novosielski
is intended for >> serial and low-core count parallel jobs) If I just leave those nodes out of >> the topology.conf file, will that have the desired affect of not allocating >> multi-node jobs to those nodes, or will it result in an error of some sort? -- || \\UTG

Re: [slurm-users] Topology configuration questions:

2019-01-18 Thread Ryan Novosielski
ound that, I guess, but by default, the behavior seems to be roughly the inverse of the node weights. -- || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB C630, Newark `' signature.asc Description: Message signed with OpenPGP

Re: [slurm-users] Topology configuration questions:

2019-01-17 Thread Ryan Novosielski
I don’t actually know the answer to this one, but we have it provisioned to all nodes. Note that if you care about node weights (eg. NodeName=whatever001 Weight=2, etc. in slurm.conf), using the topology function will disable it. I believe I was promised a warning about that in the future in a

Re: [slurm-users] Topology configuration questions:

2019-01-17 Thread Ryan Novosielski
it’s going to ignore the topology plugin, but I believe it works (and the documentation sure indicates it does). -- || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (

Re: [slurm-users] salloc with bash scripts problem

2019-01-02 Thread Ryan Novosielski
ver, what’s the advantage of “salloc --x11 srun” vs. just "srun --x11”? -- || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB C630, Newark `'

Re: [slurm-users] salloc with bash scripts problem

2019-01-02 Thread Ryan Novosielski
then occasionally send srun commands over to it. -- || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ

Re: [slurm-users] Wedged nodes from cgroups, OOM killer, and D state process

2018-12-07 Thread Ryan Novosielski
m/cgroup.conf > ConstrainCores=yes > ConstrainRAMSpace=yes > ConstrainSwapSpace=yes > > Cheers, > Chris > > — > Christopher Coffey > High-Performance Computing > Northern Arizona University > 928-523-1167 > > -- || \\UTGERS, |

Re: [slurm-users] How to check the percent cpu of a job?

2018-11-21 Thread Ryan Novosielski
cpu of a job? >> >> Hello everyone, >> >> How to check the percent cpu of a job in slurm? I tried sacct, sstat, >> squeue, but I can't find that how to check. >> Can someone help me? >> >> Best regards, >> Yalei >> -- || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\of NJ | Office of Advanced Research Computing - MSB C630, Newark `'

Re: [slurm-users] Updated Slurm tool "pestat" (Processor Element status)

2018-11-21 Thread Ryan Novosielski
Thanks Olm! I am quite fond of your utilities — thank you for providing them. Sent from my iPhone > On Nov 21, 2018, at 08:51, Ole Holm Nielsen > wrote: > > Dear Slurm users, > > The Slurm tool "pestat" (Processor Element status) has been enhanced due to a > user request. Now pestat will d

Re: [slurm-users] Job allocating more CPUs than requested

2018-09-21 Thread Ryan Novosielski
set to offline such nodes, but that affects job preemption. What sort of choices do others make in this area? - -- || \\UTGERS, |------*O* ||_// the State |Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/9

[slurm-users] "Owner" field in scontrol show node?

2018-08-08 Thread Ryan Novosielski
/s ExtSensorsWatts=0 ExtSensorsTemp=n/s Reason=HDRT #1019681 [root@2018-08-06T12:14:44] Thanks! -- || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973

Re: [slurm-users] "fatal: can't stat gres.conf"

2018-07-23 Thread Ryan Novosielski
> On Jul 23, 2018, at 10:31 PM, Ian Mortimer wrote: > > On Tue, 2018-07-24 at 02:19 +0000, Ryan Novosielski wrote: > >> Best off running nvidia-persistenced. Handles all of this stuff as a >> side effect, and also enables persistence mode, provided you don’t >> con

Re: [slurm-users] "fatal: can't stat gres.conf"

2018-07-23 Thread Ryan Novosielski
Best off running nvidia-persistenced. Handles all of this stuff as a side effect, and also enables persistence mode, provided you don’t configure it otherwise. -- || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski

  1   2   >