A couple comments/possible suggestions.
First, it looks to me that all the jobs are run from the same directory
with same input/output files. Or am I missing something?
Also, what MPI library is being used?
I would suggest verifying if any of the jobs in question are terminating
normally. I.e.
I have got this all wrong. Paddy Doyle has got it right.
However are you SURE than mpirun is not creating tasks on the other
machines?
I would look at the compute nodes while the job is running and do
ps -eaf --forest
Also using mpirun to run a single core gives me the heebie-jeebies...
https://
You are right but I'm actually supporting the system administrator of that
cluster, I'll mention this to him.
Beside that,
the user runs this for loop to submit the jobs:
# submit.sh #
typeset -i i=1
typeset -i j=12500 #number of frames goes to each core = number of frames
(100)/40 (cor
Hi Matteo,
On Fri, Jun 29, 2018 at 10:13:33AM +, Matteo Guglielmi wrote:
> Dear comunity,
>
> I have a user who usually submits 36 (identical) jobs at a time using a
> simple for loop,
> thus jobs are sbatched all the same time.
>
> Each job requests a single core and all jobs are independ
Matteo, a stupid question but if these are single CPU jobs why is mpirun
being used?
Is your user using these 36 jobs to construct a parallel job to run charmm?
If the mpirun is killed, yes all the other processes which are started by
it on the other compute nodes will be killed.
I suspect your u
Dear comunity,
I have a user who usually submits 36 (identical) jobs at a time using a simple
for loop,
thus jobs are sbatched all the same time.
Each job requests a single core and all jobs are independent from one another
(read
different input files and write to different output files).
Jobs