[slurm-users] Fwd: An issue with HOSTNAME env var when using salloc/srun for interactive job with Slurm 17.11.7
Hi, We've recently upgraded to Slurm 17.11.7 from 16.05.8. We noticed that the environment variable, HOSTNAME, does not refelct the compute node with an interactive job using the salloc/srun command. Instead it still points to the submit hostname although .SLURMD_NODENAME reflects the correct compute node name. $ salloc --immediate -p manycore --constraint=xeon64c --exclusive -O -N 1 --qos=high srun --pty bash -i salloc: Granted job allocation 2291315 salloc: Waiting for resource configuration salloc: Nodes mc-1 are ready for job [user1@mc-1 test]$ echo $HOSTNAME login-3 [user1@mc-1 test]$ echo $SLURMD_NODENAME nc-1 Is this a bug introduced with 17.11.x version or something that has been there before? According to our user, it used to point the compute node name. BTW, if I test the environment variable with a batch job, HOSTNAME environment variable reflects the compute node name correctly. Thanks, - Chansup
[slurm-users] Unexpected MPI process distribution with the --exclusive flag
Hi Everyone, I've recently discovered that when an MPI job is submitted with the --exclusive flag, Slurm fills up each node even if the --ntasks-per-node flag is used to set how many MPI processes is scheduled on each node. Without the --exclusive flag, Slurm works fine as expected. Our system is running with Slurm 17.11.7. The following options works that each node has 16 MPI processes until all 980 MPI processes are scheduled.with total of 62 compute nodes. Each of the 61 nodes has 16 MPI processes and the last one has 4 MPI processes, which is 980 MPI processes in total. #SBATCH -n 980 #SBATCH --ntasks-per-node=16 However, if the --exclusive option is added, Slurm fills up each node with 28 MPI processes (the compute node has 28 cores). Interestingly, Slurm still allocates 62 compute nodes although only 35 nodes of them are actually used to distribute 980 MPI processes. #SBATCH -n 980 #SBATCH --ntasks-per-node=16 #SBATCH --exclusive Has anyone seen this behavior? Thanks, - Chansup
Re: [slurm-users] Unexpected MPI process distribution with the --exclusive flag
Thanks for the replies. I didn't specify earlier but we're using Inte MPI and the following environment variable, I_MPI_JOB_RESPECT_PROCESS_PLACEMENT, fixed my issue. #SBATCH --ntasks=980 #SBATCH --ntasks-per-node=16 #SBATCH --exclusive export I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=off mpirun -np $SLURM_NTASKS -perhost $SLURM_NTASKS_PER_NODE /path/to/MPI/app Thanks, - Chansup On Wed, Jul 31, 2019 at 2:01 AM Daniel Letai wrote: > > On 7/30/19 6:03 PM, Brian Andrus wrote: > > I think this may be more on how you are calling mpirun and the mapping of > processes. > > With the "--exclusive" option, the processes are given access to all the > cores on each box, so mpirun has a choice. IIRC, the default is to pack > them by slot, so fill one node, then move to the next. Whereas you want to > map by node (one process per node cycling by node) > > From the man for mpirun (openmpi): > *--map-by * Map to the specified object, defaults to *socket*. > Supported options include slot, hwthread, core, L1cache, L2cache, L3cache, > socket, numa, board, node, sequential, distance, and ppr. Any object can > include modifiers by adding a : and any combination of PE=n (bind n > processing elements to each proc), SPAN (load balance the processes across > the allocation), OVERSUBSCRIBE (allow more processes on a node than > processing elements), and NOOVERSUBSCRIBE. This includes PPR, where the > pattern would be terminated by another colon to separate it from the > modifiers. > > so adding "--map-by node" would give you what you are looking for. > Of course, this syntax is for Openmpi's mpirun command, so YMMV > > If using srun (as recommended) instead of invoking mpirun directly, you > can still achieve the same functionality using exported environment > variables as per the mpirun man page, like this: > > OMPI_MCA_rmaps_base_mapping_policy=node srun --export > OMPI_MCA_rmaps_base_mapping_policy ... > > in you sbatch script. > > Brian Andrus > > > On 7/30/2019 5:14 AM, CB wrote: > > Hi Everyone, > > I've recently discovered that when an MPI job is submitted with the > --exclusive flag, Slurm fills up each node even if the --ntasks-per-node > flag is used to set how many MPI processes is scheduled on each node. > Without the --exclusive flag, Slurm works fine as expected. > > Our system is running with Slurm 17.11.7. > > The following options works that each node has 16 MPI processes until all > 980 MPI processes are scheduled.with total of 62 compute nodes. Each of > the 61 nodes has 16 MPI processes and the last one has 4 MPI processes, > which is 980 MPI processes in total. > #SBATCH -n 980 > #SBATCH --ntasks-per-node=16 > > However, if the --exclusive option is added, Slurm fills up each node with > 28 MPI processes (the compute node has 28 cores). Interestingly, Slurm > still allocates 62 compute nodes although only 35 nodes of them are > actually used to distribute 980 MPI processes. > > #SBATCH -n 980 > #SBATCH --ntasks-per-node=16 > #SBATCH --exclusive > > Has anyone seen this behavior? > > Thanks, > - Chansup > >
[slurm-users] Issue with "hetjob" directive with heterogeneous job submission script
Hi, I'm running Slurm 19.05.5. I've tried to write a job submission script for a heterogeneous job following the example at https://slurm.schedmd.com/heterogeneous_jobs.html But it failed with the following error message: $ sbatch new.bash sbatch: error: Invalid directive found in batch script: hetjob Below is the new.bash job script: $ cat new.bash #!/bin/bash #SBATCH --cpus-per-task=4 --mem-per-cpu=16g --ntasks=1 #SBATCH hetjob #SBATCH --cpus-per-task=2 --mem-per-cpu=1g --ntasks=8 srun exec_myapp.bash Has anyone tried this? I've tried the following command at the command line and it worked fine. $ sbatch --cpus-per-task=4 --mem-per-cpu=16g --ntasks=1 : --cpus-per-task=2 --mem-per-cpu=1g --ntasks=8 exec_myapp.bash Thanks, Chansup
Re: [slurm-users] Issue with "hetjob" directive with heterogeneous job submission script
Hi Mike, Thanks for the info. . Yes, Slurm 19.05 works with the "#SBATCH packjob". - Chansup On Thu, Mar 5, 2020 at 10:40 AM Renfro, Michael wrote: > I’m going to guess the job directive changed between earlier releases and > 20.02. An version of the page from last year [1] has no mention of hetjob, > and uses packjob instead. > > On a related note, is there a canonical location for older versions of > Slurm documentation? My local man pages are always consistent with the > installed version, but lots of people Google part of their solution, and > are always pointed to documentation for the latest stable release. > > [1] > https://web.archive.org/web/20191227221359/https://slurm.schedmd.com/heterogeneous_jobs.html > -- > Mike Renfro, PhD / HPC Systems Administrator, Information Technology > Services > 931 372-3601 / Tennessee Tech University > > > On Mar 4, 2020, at 2:05 PM, CB wrote: > > > > External Email Warning > > This email originated from outside the university. Please use caution > when opening attachments, clicking links, or responding to requests. > > Hi, > > > > I'm running Slurm 19.05.5. > > > > I've tried to write a job submission script for a heterogeneous job > following the example at https://slurm.schedmd.com/heterogeneous_jobs.html > > > > But it failed with the following error message: > > > > $ sbatch new.bash > > sbatch: error: Invalid directive found in batch script: hetjob > > > > Below is the new.bash job script: > > $ cat new.bash > > #!/bin/bash > > #SBATCH --cpus-per-task=4 --mem-per-cpu=16g --ntasks=1 > > #SBATCH hetjob > > #SBATCH --cpus-per-task=2 --mem-per-cpu=1g --ntasks=8 > > srun exec_myapp.bash > > > > Has anyone tried this? > > > > I've tried the following command at the command line and it worked fine. > > $ sbatch --cpus-per-task=4 --mem-per-cpu=16g --ntasks=1 : > --cpus-per-task=2 --mem-per-cpu=1g --ntasks=8 exec_myapp.bash > > > > Thanks, > > Chansup > > > >
[slurm-users] Running an MPI job across two partitions
Hi, I'm running Slurm 19.05 version. Is there any way to launch an MPI job on a group of distributed nodes from two or more partitions, where each partition has distinct compute nodes? I've looked at the heterogeneous job support but it creates two-separate jobs. If there is no such capability with the current Slurm, I'd like to hear any recommendations or suggestions. Thanks, Chansup
Re: [slurm-users] Running an MPI job across two partitions
Hi Andy, Yes, they are on teh same network fabric. Sure, creating another partition that encompass all of the nodes of the two or more partitions would solve the problem. I am wondering if there are any other ways instead of creating a new partition? Thanks, Chansup On Mon, Mar 23, 2020 at 11:51 AM Riebs, Andy wrote: > When you say “distinct compute nodes,” are they at least on the same > network fabric? > > > > If so, the first thing I’d try would be to create a new partition that > encompasses all of the nodes of the other two partitions. > > > > Andy > > > > *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On > Behalf Of *CB > *Sent:* Monday, March 23, 2020 11:32 AM > *To:* Slurm User Community List > *Subject:* [slurm-users] Running an MPI job across two partitions > > > > Hi, > > > > I'm running Slurm 19.05 version. > > > > Is there any way to launch an MPI job on a group of distributed nodes > from two or more partitions, where each partition has distinct compute > nodes? > > > > I've looked at the heterogeneous job support but it creates two-separate > jobs. > > > > If there is no such capability with the current Slurm, I'd like to hear > any recommendations or suggestions. > > > > Thanks, > > Chansup >
Re: [slurm-users] Running an MPI job across two partitions
Hi Michael, Thanks for the comment. I was just checking if there is any other way to do the job before introducing another partition. So it appears to me that creating a new partition is the way to go. Thanks, Chansup On Mon, Mar 23, 2020 at 1:25 PM Renfro, Michael wrote: > Others might have more ideas, but anything I can think of would require a > lot of manual steps to avoid mutual interference with jobs in the other > partitions (allocating resources for a dummy job in the other partition, > modifying the MPI host list to include nodes in the other partition, etc.). > > So why not make another partition encompassing both sets of nodes? > > > On Mar 23, 2020, at 10:58 AM, CB wrote: > > > > Hi Andy, > > > > Yes, they are on teh same network fabric. > > > > Sure, creating another partition that encompass all of the nodes of the > two or more partitions would solve the problem. > > I am wondering if there are any other ways instead of creating a new > partition? > > > > Thanks, > > Chansup > > > > > > On Mon, Mar 23, 2020 at 11:51 AM Riebs, Andy wrote: > > When you say “distinct compute nodes,” are they at least on the same > network fabric? > > > > > > > > If so, the first thing I’d try would be to create a new partition that > encompasses all of the nodes of the other two partitions. > > > > > > > > Andy > > > > > > > > From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On > Behalf Of CB > > Sent: Monday, March 23, 2020 11:32 AM > > To: Slurm User Community List > > Subject: [slurm-users] Running an MPI job across two partitions > > > > > > > > Hi, > > > > > > > > I'm running Slurm 19.05 version. > > > > > > > > Is there any way to launch an MPI job on a group of distributed nodes > from two or more partitions, where each partition has distinct compute > nodes? > > > > > > > > I've looked at the heterogeneous job support but it creates two-separate > jobs. > > > > > > > > If there is no such capability with the current Slurm, I'd like to hear > any recommendations or suggestions. > > > > > > > > Thanks, > > > > Chansup > > > >
[slurm-users] Executing slurm command from Lua job_submit script?
Hi, I'm running Slurm 19.05. I'm trying to execute some Slurm commands from the Lua job_submit script for a certain condition. But, I found that it's not executed and return nothing. For example, I tried to execute a "sinfo" command from an external shell script but it didn't work. Does Slurm prohibit to execute any Slurm command from the Lua job_submit command? Thanks, - Chansup
Re: [slurm-users] Executing slurm command from Lua job_submit script?
Hi Marcus, the essence of the code looks like in job_submitl.lua script, it execute an external script os.execute("/etc/slutm/test.sh".." "..job_desc.partition) and the external test.sh executes the following command to get the partition summary for further processing. sinfo -h -p $1 -s But, this sinfo command returned no result. Regards, Chansup On Fri, Apr 3, 2020 at 1:28 AM Marcus Wagner wrote: > Hi Chansup, > > could you provde a code snippet? > > Best > Marcus > > Am 02.04.2020 um 19:43 schrieb CB: > > Hi, > > > > I'm running Slurm 19.05. > > > > I'm trying to execute some Slurm commands from the Lua job_submit script > > for a certain condition. > > But, I found that it's not executed and return nothing. > > For example, I tried to execute a "sinfo" command from an external shell > > script but it didn't work. > > > > Does Slurm prohibit to execute any Slurm command from the Lua job_submit > > command? > > > > Thanks, > > - Chansup > > >