On 10-12-2023 17:29, Ryan Novosielski wrote:
This is basically always somebody filling up /tmp and /tmp residing on the same
filesystem as the actual SlurmdSpoolDirectory.
/tmp, without modifications, it’s almost certainly the wrong place for
temporary HPC files. Too large.
Agreed! That's w
We maintain /tmp as a separate partition to mitigate this exact scenario on all
nodes though it doesn’t necessarily need to be part of the primary system RAID.
No need for tmp resiliency.
Regards,
Peter
Peter Goode
Research Computing Systems Administrator
Lafayette College
> On Dec 10, 2023,
This is basically always somebody filling up /tmp and /tmp residing on the same
filesystem as the actual SlurmdSpoolDirectory.
/tmp, without modifications, it’s almost certainly the wrong place for
temporary HPC files. Too large.
Sent from my iPhone
> On Dec 8, 2023, at 10:02, Xaver Stiensmeie
Hello Brian Andrus,
we ran 'df -h' to determine the amount of free space I mentioned below.
I also should add that at the time we inspected the node, there was
still around 38 GB of space left - however, we were unable to watch the
remaining space while the error occurred so maybe the large file(
Xaver,
It is likely your /var or /var/spool mount.
That may be a separate partition or part of your root partition. It is
the partition that is full, not the directory itself. So the cause could
very well be log files in /var/log. I would check to see what (if any)
partitions are getting fille
Hi Xaver,
On 12/8/23 16:00, Xaver Stiensmeier wrote:
during a larger cluster run (the same I mentioned earlier 242 nodes), I
got the error "SlurmdSpoolDir full". The SlurmdSpoolDir is apparently a
directory on the workers that is used for job state information
(https://slurm.schedmd.com/slurm.co
Dear slurm-user list,
during a larger cluster run (the same I mentioned earlier 242 nodes), I
got the error "SlurmdSpoolDir full". The SlurmdSpoolDir is apparently a
directory on the workers that is used for job state information
(https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdSpoolDir). How