Re: [slurm-users] combine RAM between different nodes

2019-04-17 Thread Alex Chekholko
Hey Suzanne, In order to "combine" RAM between different systems, you will need a hardware/software solution like ScaleMP, or you need a software framework like OpenMPI. If your software is already written to use MPI then, in a sense, it is "combining" the memory. SLURM is a resource manager and

[slurm-users] combine RAM between different nodes

2019-04-17 Thread Susanne Pfeifer
Dear slurm community, I am relatively new to Slurm and I am wondering whether it is possible to combine RAM between different nodes (e.g., can I combine three nodes with 384GB RAM each to run a job that requires 1TB RAM)? And if, could you please advise how to do so (I can't seem to find any d

Re: [slurm-users] Pending with resource problems

2019-04-17 Thread Prentice Bisbal
No, Slurm goes strictly by what the job specifies for memory at submit time. Slurm has no way of knowing how much memory a job might need in the future. The only way to safely share a node is for Slurm to reserve the requested memory for the duration of the job. To do other wise would be a disa

Re: [slurm-users] Pending with resource problems

2019-04-17 Thread Mahmood Naderan
Yes. It seems that what user specifies, slurm will reserve that. The other jobs realtime memory is less than what users had been specified. I thought that slurm will dynamically handles that in order to put more jobs in running state. Regards, Mahmood On Wed, Apr 17, 2019 at 7:54 PM Prentice B

[slurm-users] Socket Timed Out on Send/Recv Operation

2019-04-17 Thread Yang Liu
We often received errors due to socket time out on send/recv opeartion: slurm_load_jobs error: Socket timed out on send/recv operation slurm_load_node: Socket timed out on send/recv operation What could cause the errors? How likely job_submit.lua could cause such errors? We have a program runni

Re: [slurm-users] Pending with resource problems

2019-04-17 Thread Prentice Bisbal
Mahmood, What do you see as the problem here? To me, there is no problem and the scheduler is working exactly has it should. The reason "Resources" means that there are not enough computing resources available for your job to run right now, so the job is setting in the queue in the pending sta

Re: [slurm-users] Pending with resource problems

2019-04-17 Thread Henkel, Andreas
I think there isn’t enough memory. AllocTres Shows mem=55G And your job wants another 40G although the node only has 63G in total. Best, Andreas Am 17.04.2019 um 16:45 schrieb Mahmood Naderan mailto:mahmood...@gmail.com>>: Hi, Although it was fine for previous job runs, the following script now

[slurm-users] Pending with resource problems

2019-04-17 Thread Mahmood Naderan
Hi, Although it was fine for previous job runs, the following script now stuck as PD with the reason about resources. $ cat slurm_script.sh #!/bin/bash #SBATCH --output=test.out #SBATCH --job-name=g09-test #SBATCH --ntasks=20 #SBATCH --nodelist=compute-0-0 #SBATCH --mem=40GB #SBATCH --account=z7 #