In addition to the above problem . oversubscription is NO then according to
the document.so in this scenario even if resources are available it is ot
accepting the job from other partition. Even i made the same priority for
both the partition but it didn't help. Any Suggestion here.
Slurm Worklo
If you use cgroups, tmpfs /tmp and /dev/shm usage is counted against the
requested memory for the job.
On Tue, Mar 31, 2020 at 4:51 PM Ellestad, Erik
wrote:
> How are folks managing allocation of local TmpDisk for jobs?
>
> We see how you define the location of TmpFs in slurm.conf.
>
> And then
How are folks managing allocation of local TmpDisk for jobs?
We see how you define the location of TmpFs in slurm.conf.
And then how the amount per host is defined via TmpDisk.
Then the request for srun/sbatch via --tmp=X
However, it appears SLURM only checks the defined TmpDisk amount when
al
Hi, Yair,
Out of curiosity have you checked to see if this is a runaway job?
David
On Tue, Mar 31, 2020 at 7:49 AM Yair Yarom wrote:
> Hi,
>
> We have an issue where running srun (with --pty zsh), and rebooting the
> node (from a different shell), the srun reports:
> srun: error: eio_message_s
Hi,
We have an issue where running srun (with --pty zsh), and rebooting the
node (from a different shell), the srun reports:
srun: error: eio_message_socket_accept: slurm_receive_msg[an.ip.addr.ess]:
Zero Bytes were transmitted or received
and hangs.
After the node boots, the slurm claims that jo
Hi ,
have an issue with the resource allocation.
In the environment have partition like below:
PartitionName=small_jobs Nodes=Node[17,20] Default=NO MaxTime=INFINITE
State=UP Shared=YES Priority=8000
PartitionName=large_jobs Nodes=Node[17,20] Default=NO MaxTime=INFINITE
State=UP Shared=YES Pri