[slurm-users] Fwd: An issue with HOSTNAME env var when using salloc/srun for interactive job with Slurm 17.11.7

2018-07-10 Thread CB
Hi,

We've recently upgraded to Slurm 17.11.7 from 16.05.8.

We noticed that the environment variable, HOSTNAME, does not refelct the
compute node with an interactive job using the salloc/srun command.
Instead it still points to the submit hostname although .SLURMD_NODENAME
reflects the correct  compute node name.

$ salloc --immediate -p manycore --constraint=xeon64c --exclusive -O -N 1
--qos=high  srun --pty bash -i
salloc: Granted job allocation 2291315
salloc: Waiting for resource configuration
salloc: Nodes mc-1 are ready for job

[user1@mc-1 test]$ echo $HOSTNAME
login-3

[user1@mc-1 test]$ echo $SLURMD_NODENAME
nc-1

Is this a bug introduced with 17.11.x version or something that has been
there before?  According to our user, it used to point the compute node
name.

BTW, if I test the environment variable with a batch job, HOSTNAME
environment variable reflects the compute node name correctly.

Thanks,
- Chansup


[slurm-users] Unexpected MPI process distribution with the --exclusive flag

2019-07-30 Thread CB
Hi Everyone,

I've recently discovered that when an MPI job is submitted with the
--exclusive flag, Slurm fills up each node even if the --ntasks-per-node
flag is used to set how many MPI processes is scheduled on each node.
 Without the --exclusive flag, Slurm works fine as expected.

Our system is running with Slurm 17.11.7.

The following options works that each node has 16 MPI processes until all
980 MPI processes are scheduled.with total of 62 compute nodes.  Each of
the 61 nodes has 16 MPI processes and the last one has 4 MPI processes,
which is 980 MPI processes in total.
#SBATCH -n 980
#SBATCH --ntasks-per-node=16

However, if the --exclusive option is added, Slurm fills up each node with
28 MPI processes (the compute node has 28 cores).  Interestingly, Slurm
still allocates  62 compute nodes although  only 35 nodes of them are
actually used to distribute 980 MPI processes.

#SBATCH -n 980
#SBATCH --ntasks-per-node=16
#SBATCH --exclusive

Has anyone seen this behavior?

Thanks,
- Chansup


Re: [slurm-users] Unexpected MPI process distribution with the --exclusive flag

2019-07-31 Thread CB
Thanks for the replies.

I didn't specify earlier but we're using Inte MPI and the following
environment variable, I_MPI_JOB_RESPECT_PROCESS_PLACEMENT, fixed my issue.

#SBATCH --ntasks=980
#SBATCH --ntasks-per-node=16
#SBATCH --exclusive

export I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=off
mpirun -np $SLURM_NTASKS -perhost $SLURM_NTASKS_PER_NODE  /path/to/MPI/app

Thanks,

- Chansup



On Wed, Jul 31, 2019 at 2:01 AM Daniel Letai  wrote:

>
> On 7/30/19 6:03 PM, Brian Andrus wrote:
>
> I think this may be more on how you are calling mpirun and the mapping of
> processes.
>
> With the "--exclusive" option, the processes are given access to all the
> cores on each box, so mpirun has a choice. IIRC, the default is to pack
> them by slot, so fill one node, then move to the next. Whereas you want to
> map by node (one process per node cycling by node)
>
> From the man for mpirun (openmpi):
> *--map-by * Map to the specified object, defaults to *socket*.
> Supported options include slot, hwthread, core, L1cache, L2cache, L3cache,
> socket, numa, board, node, sequential, distance, and ppr. Any object can
> include modifiers by adding a : and any combination of PE=n (bind n
> processing elements to each proc), SPAN (load balance the processes across
> the allocation), OVERSUBSCRIBE (allow more processes on a node than
> processing elements), and NOOVERSUBSCRIBE. This includes PPR, where the
> pattern would be terminated by another colon to separate it from the
> modifiers.
>
> so adding "--map-by node" would give you what you are looking for.
> Of course, this syntax is for Openmpi's mpirun command, so YMMV
>
> If using srun (as recommended) instead of invoking mpirun directly, you
> can still achieve the same functionality using exported environment
> variables as per the mpirun man page, like this:
>
> OMPI_MCA_rmaps_base_mapping_policy=node srun --export
> OMPI_MCA_rmaps_base_mapping_policy ...
>
> in you sbatch script.
>
> Brian Andrus
>
>
> On 7/30/2019 5:14 AM, CB wrote:
>
> Hi Everyone,
>
> I've recently discovered that when an MPI job is submitted with the
> --exclusive flag, Slurm fills up each node even if the --ntasks-per-node
> flag is used to set how many MPI processes is scheduled on each node.
>  Without the --exclusive flag, Slurm works fine as expected.
>
> Our system is running with Slurm 17.11.7.
>
> The following options works that each node has 16 MPI processes until all
> 980 MPI processes are scheduled.with total of 62 compute nodes.  Each of
> the 61 nodes has 16 MPI processes and the last one has 4 MPI processes,
> which is 980 MPI processes in total.
> #SBATCH -n 980
> #SBATCH --ntasks-per-node=16
>
> However, if the --exclusive option is added, Slurm fills up each node with
> 28 MPI processes (the compute node has 28 cores).  Interestingly, Slurm
> still allocates  62 compute nodes although  only 35 nodes of them are
> actually used to distribute 980 MPI processes.
>
> #SBATCH -n 980
> #SBATCH --ntasks-per-node=16
> #SBATCH --exclusive
>
> Has anyone seen this behavior?
>
> Thanks,
> - Chansup
>
>


[slurm-users] Issue with "hetjob" directive with heterogeneous job submission script

2020-03-04 Thread CB
Hi,

I'm running Slurm 19.05.5.

I've tried to write a job submission script for a heterogeneous job
following the example at https://slurm.schedmd.com/heterogeneous_jobs.html

But it failed with the following error message:

$ sbatch new.bash
sbatch: error: Invalid directive found in batch script: hetjob

Below is the new.bash job script:
$ cat new.bash
#!/bin/bash
#SBATCH --cpus-per-task=4 --mem-per-cpu=16g --ntasks=1
#SBATCH hetjob
#SBATCH --cpus-per-task=2 --mem-per-cpu=1g  --ntasks=8
srun exec_myapp.bash

Has anyone tried this?

I've tried the following command at the command line and it worked fine.
$ sbatch --cpus-per-task=4 --mem-per-cpu=16g --ntasks=1 : --cpus-per-task=2
--mem-per-cpu=1g  --ntasks=8 exec_myapp.bash

Thanks,
Chansup


Re: [slurm-users] Issue with "hetjob" directive with heterogeneous job submission script

2020-03-05 Thread CB
Hi Mike,

Thanks for the info.
.
Yes, Slurm 19.05 works with the "#SBATCH packjob".

- Chansup

On Thu, Mar 5, 2020 at 10:40 AM Renfro, Michael  wrote:

> I’m going to guess the job directive changed between earlier releases and
> 20.02. An version of the page from last year [1] has no mention of hetjob,
> and uses packjob instead.
>
> On a related note, is there a canonical location for older versions of
> Slurm documentation? My local man pages are always consistent with the
> installed version, but lots of people Google part of their solution, and
> are always pointed to documentation for the latest stable release.
>
> [1]
> https://web.archive.org/web/20191227221359/https://slurm.schedmd.com/heterogeneous_jobs.html
> --
> Mike Renfro, PhD / HPC Systems Administrator, Information Technology
> Services
> 931 372-3601 / Tennessee Tech University
>
> > On Mar 4, 2020, at 2:05 PM, CB  wrote:
> >
> > External Email Warning
> > This email originated from outside the university. Please use caution
> when opening attachments, clicking links, or responding to requests.
> > Hi,
> >
> > I'm running Slurm 19.05.5.
> >
> > I've tried to write a job submission script for a heterogeneous job
> following the example at https://slurm.schedmd.com/heterogeneous_jobs.html
> >
> > But it failed with the following error message:
> >
> > $ sbatch new.bash
> > sbatch: error: Invalid directive found in batch script: hetjob
> >
> > Below is the new.bash job script:
> > $ cat new.bash
> > #!/bin/bash
> > #SBATCH --cpus-per-task=4 --mem-per-cpu=16g --ntasks=1
> > #SBATCH hetjob
> > #SBATCH --cpus-per-task=2 --mem-per-cpu=1g  --ntasks=8
> > srun exec_myapp.bash
> >
> > Has anyone tried this?
> >
> > I've tried the following command at the command line and it worked fine.
> > $ sbatch --cpus-per-task=4 --mem-per-cpu=16g --ntasks=1 :
> --cpus-per-task=2 --mem-per-cpu=1g  --ntasks=8 exec_myapp.bash
> >
> > Thanks,
> > Chansup
> >
>
>


[slurm-users] Running an MPI job across two partitions

2020-03-23 Thread CB
Hi,

I'm running Slurm 19.05 version.

Is there any way to launch an MPI job on a group of distributed  nodes from
two or more partitions, where each partition has distinct compute nodes?

I've looked at the heterogeneous job support but it creates two-separate
jobs.

If there is no such capability with the current Slurm, I'd like to hear any
recommendations or suggestions.

Thanks,
Chansup


Re: [slurm-users] Running an MPI job across two partitions

2020-03-23 Thread CB
Hi Andy,

Yes, they are on teh same network fabric.

Sure, creating another partition that encompass all of the nodes of the two
or more partitions would solve the problem.
I am wondering if there are any other ways instead of creating a new
partition?

Thanks,
Chansup


On Mon, Mar 23, 2020 at 11:51 AM Riebs, Andy  wrote:

> When you say “distinct compute nodes,” are they at least on the same
> network fabric?
>
>
>
> If so, the first thing I’d try would be to create a new partition that
> encompasses all of the nodes of the other two partitions.
>
>
>
> Andy
>
>
>
> *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On
> Behalf Of *CB
> *Sent:* Monday, March 23, 2020 11:32 AM
> *To:* Slurm User Community List 
> *Subject:* [slurm-users] Running an MPI job across two partitions
>
>
>
> Hi,
>
>
>
> I'm running Slurm 19.05 version.
>
>
>
> Is there any way to launch an MPI job on a group of distributed  nodes
> from two or more partitions, where each partition has distinct compute
> nodes?
>
>
>
> I've looked at the heterogeneous job support but it creates two-separate
> jobs.
>
>
>
> If there is no such capability with the current Slurm, I'd like to hear
> any recommendations or suggestions.
>
>
>
> Thanks,
>
> Chansup
>


Re: [slurm-users] Running an MPI job across two partitions

2020-03-24 Thread CB
Hi Michael,

Thanks for the comment.

I was just checking if there is any other way to do the job before
introducing another partition.
So it appears to me that creating a new partition is the way to go.

Thanks,
Chansup

On Mon, Mar 23, 2020 at 1:25 PM Renfro, Michael  wrote:

> Others might have more ideas, but anything I can think of would require a
> lot of manual steps to avoid mutual interference with jobs in the other
> partitions (allocating resources for a dummy job in the other partition,
> modifying the MPI host list to include nodes in the other partition, etc.).
>
> So why not make another partition encompassing both sets of nodes?
>
> > On Mar 23, 2020, at 10:58 AM, CB  wrote:
> >
> > Hi Andy,
> >
> > Yes, they are on teh same network fabric.
> >
> > Sure, creating another partition that encompass all of the nodes of the
> two or more partitions would solve the problem.
> > I am wondering if there are any other ways instead of creating a new
> partition?
> >
> > Thanks,
> > Chansup
> >
> >
> > On Mon, Mar 23, 2020 at 11:51 AM Riebs, Andy  wrote:
> > When you say “distinct compute nodes,” are they at least on the same
> network fabric?
> >
> >
> >
> > If so, the first thing I’d try would be to create a new partition that
> encompasses all of the nodes of the other two partitions.
> >
> >
> >
> > Andy
> >
> >
> >
> > From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On
> Behalf Of CB
> > Sent: Monday, March 23, 2020 11:32 AM
> > To: Slurm User Community List 
> > Subject: [slurm-users] Running an MPI job across two partitions
> >
> >
> >
> > Hi,
> >
> >
> >
> > I'm running Slurm 19.05 version.
> >
> >
> >
> > Is there any way to launch an MPI job on a group of distributed  nodes
> from two or more partitions, where each partition has distinct compute
> nodes?
> >
> >
> >
> > I've looked at the heterogeneous job support but it creates two-separate
> jobs.
> >
> >
> >
> > If there is no such capability with the current Slurm, I'd like to hear
> any recommendations or suggestions.
> >
> >
> >
> > Thanks,
> >
> > Chansup
> >
>
>


[slurm-users] Executing slurm command from Lua job_submit script?

2020-04-02 Thread CB
Hi,

I'm running Slurm 19.05.

I'm trying to execute some Slurm commands from the Lua job_submit script
for a certain condition.
But, I found that it's not executed and return nothing.
For example, I tried to execute a "sinfo" command from an external shell
script but it didn't work.

Does Slurm prohibit to execute any Slurm command from the Lua job_submit
command?

Thanks,
- Chansup


Re: [slurm-users] Executing slurm command from Lua job_submit script?

2020-04-03 Thread CB
Hi Marcus,

the essence of the code looks like

in job_submitl.lua script, it execute an external script

os.execute("/etc/slutm/test.sh".." "..job_desc.partition)

and the external test.sh executes the following command to get the
partition summary for further processing.

sinfo -h -p $1 -s

But, this sinfo command returned no result.

Regards,
Chansup

On Fri, Apr 3, 2020 at 1:28 AM Marcus Wagner 
wrote:

> Hi Chansup,
>
> could you provde a code snippet?
>
> Best
> Marcus
>
> Am 02.04.2020 um 19:43 schrieb CB:
> > Hi,
> >
> > I'm running Slurm 19.05.
> >
> > I'm trying to execute some Slurm commands from the Lua job_submit script
> > for a certain condition.
> > But, I found that it's not executed and return nothing.
> > For example, I tried to execute a "sinfo" command from an external shell
> > script but it didn't work.
> >
> > Does Slurm prohibit to execute any Slurm command from the Lua job_submit
> > command?
> >
> > Thanks,
> > - Chansup
>
>
>