[slurm-users] unable to Hold and release the job using scontrol

2021-05-22 Thread Zainul Abiddin
Hi All, i am trying to hold the job from Scontol but not able to hold the job. i am not able to understand..can any one please explain the concept of Hold and Release, Suspend and Resume. Please find the below steps which i have tried. [root@master ~]# cat test.sh #!/bin/bash #SBATCH -N 1 #SBATC

[slurm-users] Unable to get output file once job is completed

2021-02-07 Thread Zainul Abiddin
Hi All, I have created accounts and users on a cluster and I have login with one of my users and submitted the job, job is completed but output file is not created can anyone help me on this. [zain@smaster ~]$ sacctmgr list associations cluster=scluster format=Account,Cluster,User,Fairshare tree

[slurm-users] How to assign maximum cores to particular department

2021-02-04 Thread Zainul Abiddin
Hi all, Please help us to explain the details for below scenario’s We have two nodes(smaster, snode) with 16 cores each, Totally 32cores and We have 3 Departments with 4 Students each. 1. How to create a cluster (ex: hpcc) 2. How to create an account(D1) and add to a specific cluster(hpcc). 3. How

Re: [slurm-users] How to submit simple job on Master and Compute nodes

2021-02-04 Thread Zainul Abiddin
Thanks for your input* Loris Bennett,* Now i am able to submit jobs on both nodes. -- *Regards* *Zain* On Thu, Feb 4, 2021 at 5:04 PM Zainul Abiddin wrote: > Hi All, > > Please help me to submit a simple job on master and compute nodes. > Here is the my commands > > [root

[slurm-users] How to submit simple job on Master and Compute nodes

2021-02-04 Thread Zainul Abiddin
Hi All, Please help me to submit a simple job on master and compute nodes. Here is the my commands [root@smaster ~]# sinfo -Nl Thu Feb 04 16:54:58 2021 NODELIST NODES PARTITION STATE CPUSS:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON smaster1 hpc*idle 4 4:1:1

Re: [slurm-users] Slurm - sacct: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:localhost:6819: Connection refused

2021-02-04 Thread Zainul Abiddin
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST debugup infinite 1 idle snode hpc* up infinite 1 idle smaster [root@smaster ~]# Regards, Zain On Tue, Feb 2, 2021 at 6:35 PM Zainul Abiddin wrote: > Hi All, > I have done slurmdbd configuration and while i am try

[slurm-users] Slurm - sacct: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:localhost:6819: Connection refused

2021-02-02 Thread Zainul Abiddin
Hi All, I have done slurmdbd configuration and while i am trying to run account manager with *sacct* i am getting below error. [root@smaster ~]# sacct sacct: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:localhost:6819: Connection refused sacct: error: S

Re: [slurm-users] Slurm - Munge configuration details

2021-02-02 Thread Zainul Abiddin
) MAC: sha1 (3) ZIP: none (0) UID: root (0) GID: root (0) LENGTH: 0 [root@smaster ~]# Regards, Zain On Tue, Feb 2, 2021 at 6:05 PM Zainul Abiddin wrote: > Hi, > [root@smaster ~]# munge -n | unmunge > STATUS: Success (0)

[slurm-users] Slurm : compute node status is UNKNOWN and Reason=NO NETWORK ADDRESS FOUND

2021-02-02 Thread Zainul Abiddin
Hi All, Please help me to resolve this issue My compute node (snode) status is UNKNOWN and Reason=NO NETWORK ADDRESS FOUND Master node (smaster) : [root@smaster ~]# cat /etc/slurm/slurm.conf # slurm.conf file generated by configurator easy.html. # Put this file on all nodes of your cluster. # S

Re: [slurm-users] Slurm - Munge configuration details

2021-02-02 Thread Zainul Abiddin
# On Tue, Feb 2, 2021 at 6:00 PM Zainul Abiddin wrote: > Hi All, > I am new to Slurm and trying to setup Slurm20.11.2 on Centos 7 > My environment is Master node (smaster) + compute Node (snode) > and i am using > https://www.slothparadise.com/how-to-install-slurm-on-centos-7-clus

[slurm-users] Slurm - Munge configuration details

2021-02-02 Thread Zainul Abiddin
Hi All, I am new to Slurm and trying to setup Slurm20.11.2 on Centos 7 My environment is Master node (smaster) + compute Node (snode) and i am using https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/ link to setup Slurm on Master and compute nodes. I have tried installing Mung