Hey Suzanne,
In order to "combine" RAM between different systems, you will need a
hardware/software solution like ScaleMP, or you need a software framework
like OpenMPI. If your software is already written to use MPI then, in a
sense, it is "combining" the memory.
SLURM is a resource manager and
Dear slurm community,
I am relatively new to Slurm and I am wondering whether it is possible
to combine RAM between different nodes (e.g., can I combine three nodes
with 384GB RAM each to run a job that requires 1TB RAM)? And if, could
you please advise how to do so (I can't seem to find any d
No, Slurm goes strictly by what the job specifies for memory at submit
time. Slurm has no way of knowing how much memory a job might need in
the future. The only way to safely share a node is for Slurm to reserve
the requested memory for the duration of the job. To do other wise would
be a disa
Yes. It seems that what user specifies, slurm will reserve that. The other
jobs realtime memory is less than what users had been specified. I thought
that slurm will dynamically handles that in order to put more jobs in
running state.
Regards,
Mahmood
On Wed, Apr 17, 2019 at 7:54 PM Prentice B
We often received errors due to socket time out on send/recv opeartion:
slurm_load_jobs error: Socket timed out on send/recv operation
slurm_load_node: Socket timed out on send/recv operation
What could cause the errors? How likely job_submit.lua could cause such errors?
We have a program runni
Mahmood,
What do you see as the problem here? To me, there is no problem and the
scheduler is working exactly has it should. The reason "Resources" means
that there are not enough computing resources available for your job to
run right now, so the job is setting in the queue in the pending sta
I think there isn’t enough memory.
AllocTres Shows mem=55G
And your job wants another 40G although the node only has 63G in total.
Best,
Andreas
Am 17.04.2019 um 16:45 schrieb Mahmood Naderan
mailto:mahmood...@gmail.com>>:
Hi,
Although it was fine for previous job runs, the following script now
Hi,
Although it was fine for previous job runs, the following script now stuck
as PD with the reason about resources.
$ cat slurm_script.sh
#!/bin/bash
#SBATCH --output=test.out
#SBATCH --job-name=g09-test
#SBATCH --ntasks=20
#SBATCH --nodelist=compute-0-0
#SBATCH --mem=40GB
#SBATCH --account=z7
#