Re: [slurm-users] Multinode MPI job

2019-03-29 Thread Mahmood Naderan
I found out that the standard script that specifies the number of tasks and memory per cpu will do the same thing that I was expecting from packjob (heterogeneous job). #SBATCH --job-name=myQE #SBATCH --output=big-mem #SBATCH --ntasks=14 #SBATCH --mem-per-cpu=17G #SBATCH --nodes=6 #SBATCH --partit

Re: [slurm-users] Multinode MPI job

2019-03-28 Thread Mahmood Naderan
I test with env strace srun --pack-group=0 --ntasks=2 : --pack-group=1 --ntasks=4 pw.x -i mos2.rlx.in in the slurm script and everything is fine now!! This is going to be a nasty bug to find... Regards, Mahmood On Thu, Mar 28, 2019 at 9:18 PM Mahmood Naderan wrote: > Yes that works. > > $

Re: [slurm-users] Multinode MPI job

2019-03-28 Thread Mahmood Naderan
Yes that works. $ grep "Parallel version" big-mem Parallel version (MPI), running on 1 processors Parallel version (MPI), running on 1 processors Parallel version (MPI), running on 1 processors Parallel version (MPI), running on 1 processors $ squeue

Re: [slurm-users] Multinode MPI job

2019-03-28 Thread Frava
Well, does it also crash when you run it with two nodes in a normal way (not using heterogeneous jobs) ? #!/bin/bash #SBATCH --job-name=myQE_2Nx2MPI #SBATCH --output=big-mem #SBATCH --nodes=2 #SBATCH --ntasks-per-node=2 #SBATCH --mem-per-cpu=16g #SBATCH --partition=QUARTZ #SBATCH --account=z5 # sr

Re: [slurm-users] Multinode MPI job

2019-03-28 Thread Mahmood Naderan
BTW, when I manually run on a node, e.g. compute-0-2, I get this output ]$ mpirun -np 4 pw.x -i mos2.rlx.in Program PWSCF v.6.2 starts on 28Mar2019 at 11:40:36 This program is part of the open-source Quantum ESPRESSO suite for quantum simulation of materials; please cite

Re: [slurm-users] Multinode MPI job

2019-03-28 Thread Mahmood Naderan
The run is not consistent. I have manually test "mpirun -np 4 pw.x -i mos2.rlx.in" on compute-0-2 and rocks7 nodes and it is fine. However, with the script "srun --pack-group=0 --ntasks=2 : --pack-group=1 --ntasks=4 pw.x -i mos2.rlx.in" I see some errors in the output file which results in abortion

Re: [slurm-users] Multinode MPI job

2019-03-28 Thread Frava
I didn't receive the last mail from Mahmood but Marcus is right, Mahmood's heterogeneous job submission seems to be working now. Well, separating each pack in the srun command and asking for the correct number of tasks to be launched for each pack is the way I figured the heterogeneous jobs worked

Re: [slurm-users] Multinode MPI job

2019-03-28 Thread Marcus Wagner
Hi Mahmood, On 3/28/19 7:33 AM, Mahmood Naderan wrote: >srun --pack-group=0 --ntasks=2 : --pack-group=1 --ntasks=4 pw.x -i mos2.rlx.in Still only one node is running the processes no, the processes are running as had been asked for. $ squeue JOBID PARTITION

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Mahmood Naderan
$ srun --version slurm 18.08.4 I have noticed that after 60 seconds, the job is aborted according to the output log file. srun: First task exited 60s ago srun: step:759.0 pack_group:0 tasks 0-1: exited srun: step:760.0 pack_group:1 tasks 0-1: running srun: step:760.0 pack_group:1 tasks 2-3: exite

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Chris Samuel
On Wednesday, 27 March 2019 11:33:30 PM PDT Mahmood Naderan wrote: > Still only one node is running the processes What does "srun --version" say? Do you get any errors in your output file from the second pack job? All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley,

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Mahmood Naderan
>srun --pack-group=0 --ntasks=2 : --pack-group=1 --ntasks=4 pw.x -i mos2.rlx.in Still only one node is running the processes $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 755+1QUARTZ myQE ghatee R 0:47 1 rocks7

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Frava
Hi, if you try this SBATCH script, does it work ? #!/bin/bash #SBATCH --job-name=myQE #SBATCH --output=big-mem # #SBATCH --mem-per-cpu=16g --ntasks=2 #SBATCH -N 1 #SBATCH --partition=QUARTZ #SBATCH --account=z5 # #SBATCH packjob # #SBATCH --mem-per-cpu=10g --ntasks=4 #SBATCH -N 1 #SBATCH --partiti

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Mahmood Naderan
OK. The two different partitions I saw was due to not specifying partition name for the first set (before packjob). Here is a better script #!/bin/bash #SBATCH --job-name=myQE #SBATCH --output=big-mem #SBATCH --mem-per-cpu=16g --ntasks=2 #SBATCH -N 1 #SBATCH --partition=QUARTZ #SBATCH --account=z5

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Christopher Samuel
On 3/27/19 11:29 AM, Mahmood Naderan wrote: Thank you very much. you are right. I got it. Cool, good to hear. I'd love to hear whether you get heterogenous MPI jobs working too! All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Mahmood Naderan
Thank you very much. you are right. I got it. Regards, Mahmood On Wed, Mar 27, 2019 at 10:33 PM Thomas M. Payerle wrote: > As partition CLUSTER is not in your /etc/slurm/parts file, it likely was > added via scontrol command. > Presumably you or a colleague created a CLUSTER partition, wheth

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Thomas M. Payerle
As partition CLUSTER is not in your /etc/slurm/parts file, it likely was added via scontrol command. Presumably you or a colleague created a CLUSTER partition, whether intentionally or not. Use scontrol show partition CLUSTER to view it. On Wed, Mar 27, 2019 at 1:44 PM Mahmood Naderan wrote: >

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Mahmood Naderan
So, it seems that it is not an easy thing at the moment! >Partitions are defined by the systems administrators, you'd need to >speak with them about their reasoning for those. Its me :) I haven't defined a partition named CLUSTER Regards, Mahmood On Wed, Mar 27, 2019 at 8:42 PM Christopher S

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Christopher Samuel
On 3/27/19 8:39 AM, Mahmood Naderan wrote: mpirun pw.x -imos2.rlx.in You will need to read the documentation for this: https://slurm.schedmd.com/heterogeneous_jobs.html Especially note both of these: IMPORTANT: The ability to execute a single application across more th

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Prentice Bisbal
On 3/27/19 11:25 AM, Christopher Samuel wrote: On 3/27/19 8:07 AM, Prentice Bisbal wrote: sbatch -n 24 -w  Node1,Node2 That will allocate 24 cores (tasks, technically) to your job, and only use Node1 and Node2. You did not mention any memory requirements of your job, so I assumed memory is

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Mahmood Naderan
>If your SLURM version is at least 18.08 then you should be able to do it with an heterogeneous job. See https://slurm.schedmd.com/>heterogeneous_jobs.html >From the example in that page, I have written this #!/bin/bash #SBATCH --job-name=myQE

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Christopher Samuel
On 3/27/19 8:07 AM, Prentice Bisbal wrote: sbatch -n 24 -w  Node1,Node2 That will allocate 24 cores (tasks, technically) to your job, and only use Node1 and Node2. You did not mention any memory requirements of your job, so I assumed memory is not an issue and didn't specify any in my comman

Re: [slurm-users] Multinode MPI job

2019-03-27 Thread Prentice Bisbal
On 3/25/19 8:09 AM, Mahmood Naderan wrote: Hi Is it possible to submit a multinode mpi job with the following config: Node1: 16 cpu, 90GB Node2: 8 cpu, 20GB ? Regards, Mahmood Yes: sbatch -n 24 -w  Node1,Node2 That will allocate 24 cores (tasks, technically) to your job, and only use Node

Re: [slurm-users] Multinode MPI job

2019-03-25 Thread Frava
Hi Mahmood, If your SLURM version is at least 18.08 then you should be able to do it with an heterogeneous job. See https://slurm.schedmd.com/heterogeneous_jobs.html Cheers, Rafael. Le lun. 25 mars 2019 à 13:10, Mahmood Naderan a écrit : > Hi > Is it possible to submit a multinode mpi job with

[slurm-users] Multinode MPI job

2019-03-25 Thread Mahmood Naderan
Hi Is it possible to submit a multinode mpi job with the following config: Node1: 16 cpu, 90GB Node2: 8 cpu, 20GB ? Regards, Mahmood