date:20231210

[slurm-users] Recommendation on running multiple jobs

2023-12-10 Thread Sundaram Kumaran

Dear Users, May I have your guidance? How to run the multiple job in the server, We have 2 servers Platinum and Cerium, 1. when I launch the 2 job in Platinum the tool launches successfully and distribute the job to 2 different servers. but while launching the 3rd job the resource is in queu

[slurm-users] slurm bank utility

2023-12-10 Thread Purvesh Parmar

Hi, We are using slurm 21.08. We are curious to know how to use "sbank" utility for crediting GPU Hours , just like cpu minutes, and also get the status of GPUHours credited, used etc. Actually, sbank utility from github is not having functionality of adding / querying the GPUHours Any other mean

Re: [slurm-users] SlurmdSpoolDir full

2023-12-10 Thread Ole Holm Nielsen

On 10-12-2023 17:29, Ryan Novosielski wrote: This is basically always somebody filling up /tmp and /tmp residing on the same filesystem as the actual SlurmdSpoolDirectory. /tmp, without modifications, it’s almost certainly the wrong place for temporary HPC files. Too large. Agreed! That's w

Re: [slurm-users] SlurmdSpoolDir full

2023-12-10 Thread Peter Goode

We maintain /tmp as a separate partition to mitigate this exact scenario on all nodes though it doesn’t necessarily need to be part of the primary system RAID. No need for tmp resiliency. Regards, Peter Peter Goode Research Computing Systems Administrator Lafayette College > On Dec 10, 2023,

Re: [slurm-users] SlurmdSpoolDir full

2023-12-10 Thread Ryan Novosielski

This is basically always somebody filling up /tmp and /tmp residing on the same filesystem as the actual SlurmdSpoolDirectory. /tmp, without modifications, it’s almost certainly the wrong place for temporary HPC files. Too large. Sent from my iPhone > On Dec 8, 2023, at 10:02, Xaver Stiensmeie

Re: [slurm-users] SlurmdSpoolDir full

2023-12-10 Thread Xaver Stiensmeier

Hello Brian Andrus, we ran 'df -h' to determine the amount of free space I mentioned below. I also should add that at the time we inspected the node, there was still around 38 GB of space left - however, we were unable to watch the remaining space while the error occurred so maybe the large file(

[slurm-users] Recommendation on running multiple jobs

[slurm-users] slurm bank utility

Re: [slurm-users] SlurmdSpoolDir full

Re: [slurm-users] SlurmdSpoolDir full

Re: [slurm-users] SlurmdSpoolDir full

Re: [slurm-users] SlurmdSpoolDir full

6 matches

Site Navigation

Mail list logo

Footer information