A few things to check here:

  *   Ensure that your firewall ports are open – ports 6817/6818/6819/3306
  *   Make sure that munge is working correctly:
$ munge -n | unmunge


  *   Make sure you go through the accounting web-page as well - 
https://slurm.schedmd.com/accounting.html
     *   In particular, ensure that you can connect to the MySQL server, create 
the slurm user within MySQL database, give it the required permissions, etc,  
Go through the “Live example” on the accounting web-page.
  *   Walk through your log files – especially the slurmdbd.log file and clear 
up all errors.
  *   As a general comment, put in the fewest number of configuration options 
into your slurm.conf and slurmdbd.conf file as possible – use the defaults when 
you can.  Add items incrementally and carefully so you can back-out easily when 
you make mistakes (and you will!)
  *   In my slurm.conf, I also have specified the AccountingStorageHost, 
AccountingStorageUser and AccountingStoragePort – not sure if I need any of 
these though…

Mike

From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of 
slurm-users-requ...@lists.schedmd.com <slurm-users-requ...@lists.schedmd.com>
Date: Tuesday, February 2, 2021 at 8:16 AM
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
Subject: slurm-users Digest, Vol 40, Issue 4
Send slurm-users mailing list submissions to
        slurm-users@lists.schedmd.com

To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.schedmd.com/cgi-bin/mailman/listinfo/slurm-users
or, via email, send a message with subject or body 'help' to
        slurm-users-requ...@lists.schedmd.com

You can reach the person managing the list at
        slurm-users-ow...@lists.schedmd.com

When replying, please edit your Subject line so it is more specific
than "Re: Contents of slurm-users digest..."


Today's Topics:

   1. Slurm - sacct: error: slurm_persist_conn_open_without_init:
      failed to open persistent connection to host:localhost:6819:
      Connection refused (Zainul Abiddin)
   2. Re: Slurm - Munge configuration details (Benson Muite)


----------------------------------------------------------------------

Message: 1
Date: Tue, 2 Feb 2021 18:35:20 +0530
From: Zainul Abiddin <zainul1...@gmail.com>
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] Slurm - sacct: error:
        slurm_persist_conn_open_without_init: failed to open persistent
        connection to host:localhost:6819: Connection refused
Message-ID:
        <caa9r82u0l7vdzdhvp_1kfwmvrll-cc5vhavr2sgtuwn_1ax...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi All,
I have done slurmdbd configuration and while i am trying to run account
manager with *sacct* i am getting below error.

[root@smaster ~]# sacct
sacct: error: slurm_persist_conn_open_without_init: failed to open
persistent connection to host:localhost:6819: Connection refused
sacct: error: Sending PersistInit msg: Connection refused
sacct: error: Problem talking to the database: Connection refused
[root@smaster ~]#

My slurmdbd configuration :
[root@smaster ~]# cat /etc/slurm/slurmdbd.conf
AuthType=auth/munge
DbdAddr=localhost
DbdHost=localhost
SlurmUser=slurm
DebugLevel=4
LogFile=/var/log/slurm/slurmdbd.log
PidFile=/var/run/slurmdbd.pid
StorageType=accounting_storage/mysql
StorageHost=localhost
StoragePass=password
StorageUser=slurm
StorageLoc=slurm_acct_db

[root@smaster ~]# chown slurm: /etc/slurm/slurmdbd.conf
[root@smaster ~]# chmod 600 /etc/slurm/slurmdbd.conf
[root@smaster ~]# mkdir /var/log/slurm
[root@smaster ~]# touch /var/log/slurm/slurmdbd.log
[root@smaster ~]# chown slurm: /var/log/slurm/slurmdbd.log
[root@smaster ~]# scontrol show config | grep AccountingStorageHost
AccountingStorageHost   = localhost

Note:
i have edited file /etc/slurm/slurm.conf and modified the below line
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/slurmdbd
Then restarted all the services

[root@smaster ~]# for i in munge slurmd slurmctld slurmdbd; do service $i
status; done
Redirecting to /bin/systemctl status munge.service
? munge.service - MUNGE authentication service
   Loaded: loaded (/usr/lib/systemd/system/munge.service; enabled; vendor
preset: disabled)
   Active: active (running) since Tue 2021-02-02 13:21:10 IST; 3h 36min ago
     Docs: man:munged(8)
 Main PID: 20613 (munged)
   CGroup: /system.slice/munge.service
           ??20613 /usr/sbin/munged

Feb 02 13:21:10 smaster.calligotech.com systemd[1]: Stopped MUNGE
authentication service.
Feb 02 13:21:10 smaster.calligotech.com systemd[1]: Starting MUNGE
authentication service...
Feb 02 13:21:10 smaster.calligotech.com systemd[1]: Started MUNGE
authentication service.
Redirecting to /bin/systemctl status slurmd.service
? slurmd.service - Slurm node daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; vendor
preset: disabled)
   Active: active (running) since Tue 2021-02-02 13:21:10 IST; 3h 36min ago
 Main PID: 20637 (slurmd)
   CGroup: /system.slice/slurmd.service
           ??20637 /usr/sbin/slurmd -D

Feb 02 13:21:10 smaster.calligotech.com systemd[1]: Started Slurm node
daemon.
Feb 02 15:30:47 smaster.calligotech.com slurmd[20637]: slurmd: Launching
batch job 7 for UID 0
Feb 02 15:31:46 smaster.calligotech.com slurmd[20637]: slurmd: Launching
batch job 8 for UID 0
Feb 02 15:33:43 smaster.calligotech.com slurmd[20637]: slurmd: Launching
batch job 9 for UID 0

Redirecting to /bin/systemctl status slurmctld.service
? slurmctld.service - Slurm controller daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; enabled;
vendor preset: disabled)
   Active: active (running) since Tue 2021-02-02 13:21:11 IST; 3h 36min ago
 Main PID: 20660 (slurmctld)
   CGroup: /system.slice/slurmctld.service
           ??20660 /usr/sbin/slurmctld -D

Feb 02 13:21:11 smaster.calligotech.com systemd[1]: Started Slurm
controller daemon.
Redirecting to /bin/systemctl status slurmdbd.service
? slurmdbd.service - Slurm DBD accounting daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmdbd.service; enabled;
vendor preset: disabled)
   Active: active (running) since Tue 2021-02-02 16:29:11 IST; 28min ago
 Main PID: 24146 (slurmdbd)
   CGroup: /system.slice/slurmdbd.service
           ??24146 /usr/sbin/slurmdbd -D

Feb 02 16:29:11 smaster.calligotech.com systemd[1]: Started Slurm DBD
accounting daemon.
[root@smaster ~]# srun --ntasks=2 --label /bin/hostname
srun: job 22 queued and waiting for resources
srun: job 22 has been allocated resources
1: smaster.calligotech.com
0: smaster.calligotech.com
[root@smaster ~]#


However when i run the below command

[root@smaster ~]# sacct
sacct: error: slurm_persist_conn_open_without_init: failed to open
persistent connection to host:localhost:6819: Connection refused
sacct: error: Sending PersistInit msg: Connection refused
sacct: error: Problem talking to the database: Connection refused
[root@smaster ~]#

and i have troubleshooted below steps

[root@smaster ~]# telnet localhost 6819
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
[root@smaster ~]#

[root@smaster ~]# mysql -p -u slurm slurm_acct_db
Enter password:
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 9
Server version: 10.1.48-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input
statement.

MariaDB [slurm_acct_db]> show tables;
Empty set (0.00 sec)

MariaDB [slurm_acct_db]>

Then i have added DBPort and restarted services
[root@smaster ~]# cat /etc/slurm/slurmdbd.conf
AuthType=auth/munge
DbdAddr=localhost
DbdHost=localhost
*DbdPort=6819*
SlurmUser=slurm
DebugLevel=4
LogFile=/var/log/slurm/slurmdbd.log
PidFile=/var/run/slurmdbd.pid
StorageType=accounting_storage/mysql
StorageHost=localhost
StoragePass=password
StorageUser=slurm
StorageLoc=slurm_acct_db
[root@smaster ~]#

[root@smaster ~]# for i in munge slurmd slurmctld slurmdbd; do service $i
status; done
Redirecting to /bin/systemctl status munge.service
? munge.service - MUNGE authentication service
   Loaded: loaded (/usr/lib/systemd/system/munge.service; enabled; vendor
preset: disabled)
   Active: active (running) since Tue 2021-02-02 13:21:10 IST; 3h 55min ago
     Docs: man:munged(8)
 Main PID: 20613 (munged)
   CGroup: /system.slice/munge.service
           ??20613 /usr/sbin/munged

Feb 02 13:21:10 smaster.calligotech.com systemd[1]: Stopped MUNGE
authentication service.
Feb 02 13:21:10 smaster.calligotech.com systemd[1]: Starting MUNGE
authentication service...
Feb 02 13:21:10 smaster.calligotech.com systemd[1]: Started MUNGE
authentication service.
Redirecting to /bin/systemctl status slurmd.service
? slurmd.service - Slurm node daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; vendor
preset: disabled)
   Active: active (running) since Tue 2021-02-02 13:21:10 IST; 3h 55min ago
 Main PID: 20637 (slurmd)
   CGroup: /system.slice/slurmd.service
           ??20637 /usr/sbin/slurmd -D

Feb 02 15:30:47 smaster.calligotech.com slurmd[20637]: slurmd: Launching
batch job 7 for UID 0
Feb 02 15:31:46 smaster.calligotech.com slurmd[20637]: slurmd: Launching
batch job 8 for UID 0
Feb 02 15:33:43 smaster.calligotech.com slurmd[20637]: slurmd: Launching
batch job 9 for UID 0
Feb 02 15:38:45 smaster.calligotech.com slurmd[20637]: slurmd: Launching
batch job 12 for UID 0

Redirecting to /bin/systemctl status slurmctld.service
? slurmctld.service - Slurm controller daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; enabled;
vendor preset: disabled)
   Active: active (running) since Tue 2021-02-02 13:21:11 IST; 3h 55min ago
 Main PID: 20660 (slurmctld)
   CGroup: /system.slice/slurmctld.service
           ??20660 /usr/sbin/slurmctld -D

Feb 02 13:21:11 smaster.calligotech.com systemd[1]: Started Slurm
controller daemon.
Redirecting to /bin/systemctl status slurmdbd.service
? slurmdbd.service - Slurm DBD accounting daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmdbd.service; enabled;
vendor preset: disabled)
   Active: active (running) since Tue 2021-02-02 16:29:11 IST; 47min ago
 Main PID: 24146 (slurmdbd)
   CGroup: /system.slice/slurmdbd.service
           ??24146 /usr/sbin/slurmdbd -D

Feb 02 16:29:11 smaster.calligotech.com systemd[1]: Started Slurm DBD
accounting daemon.
[root@smaster ~]# ps -ef |grep slurm
root     20637     1  0 13:21 ?        00:00:00 /usr/sbin/slurmd -D
slurm    20660     1  0 13:21 ?        00:00:08 /usr/sbin/slurmctld -D
root     24146     1  0 16:29 ?        00:00:00 /usr/sbin/slurmdbd -D
root     25395 18378  0 17:17 pts/2    00:00:00 grep --color=auto slurm
[root@smaster ~]# sacct
sacct: error: slurm_persist_conn_open_without_init: failed to open
persistent connection to host:localhost:6819: Connection refused
sacct: error: Sending PersistInit msg: Connection refused
sacct: error: Problem talking to the database: Connection refused
[root@smaster ~]#

[root@smaster ~]# tail /var/log/slurm/slurmdbd.log
[2021-02-02T17:16:01.913] error: mysql_real_connect failed: 2005 Unknown
MySQL server host 'smater' (-2)
[2021-02-02T17:16:01.913] error: The database must be up when starting the
MYSQL plugin.  Trying again in 5 seconds.
[2021-02-02T17:16:06.963] error: mysql_real_connect failed: 2005 Unknown
MySQL server host 'smater' (-2)
[2021-02-02T17:16:06.963] error: The database must be up when starting the
MYSQL plugin.  Trying again in 5 seconds.
[2021-02-02T17:16:12.083] error: mysql_real_connect failed: 2005 Unknown
MySQL server host 'smater' (-2)
[2021-02-02T17:16:12.083] error: The database must be up when starting the
MYSQL plugin.  Trying again in 5 seconds.
[2021-02-02T17:16:17.140] error: mysql_real_connect failed: 2005 Unknown
MySQL server host 'smater' (-2)
[2021-02-02T17:16:17.141] error: The database must be up when starting the
MYSQL plugin.  Trying again in 5 seconds.
[2021-02-02T17:16:22.804] error: mysql_real_connect failed: 2005 Unknown
MySQL server host 'smater' (-2)
[2021-02-02T17:16:22.804] error: The database must be up when starting the
MYSQL plugin.  Trying again in 5 seconds.
[root@smaster ~]#

Still the problem remains the same. Please help me to resolve this issue.

Regards,
Zain
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.schedmd.com/pipermail/slurm-users/attachments/20210202/f2348489/attachment-0001.htm>

------------------------------

Message: 2
Date: Tue, 2 Feb 2021 16:16:09 +0300
From: Benson Muite <benson_mu...@emailplus.org>
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] Slurm - Munge configuration details
Message-ID: <bd36d545-4fd7-05ec-4a51-bb2743258...@emailplus.org>
Content-Type: text/plain; charset=utf-8; format=flowed

On 2/2/21 4:00 PM, Zainul Abiddin wrote:
> Hi Benson,
>
> I am not able to do passwordless ssh? between master and compute nodes
> using Munge service.
> when i am running below command , here it is asking for a password for
> the compute node.
>
> /Am I configuring properly or not, so I need clarity on this?/
>
> [root@smaster ~]# munge -n | ssh snode unmunge
> root@snode's password:
> STATUS: ? ? ? ? ? Success (0)
> ENCODE_HOST: smaster.calligotech.com
> <http://smaster.calligotech.com/>?(192.168.1.195<http://smaster.calligotech.com/%3e?(192.168.1.195>)
> ENCODE_TIME: ? ? ?2021-02-01 13:58:16 +0530 (1612168096)
> DECODE_TIME: ? ? ?2021-02-01 13:58:21 +0530 (1612168101)
> TTL: ? ? ? ? ? ? ?300
> CIPHER: ? ? ? ? ? aes128 (4)
> MAC: ? ? ? ? ? ? ?sha1 (3)
> ZIP: ? ? ? ? ? ? ?none (0)
> UID: ? ? ? ? ? ? ?root (0)
> GID: ? ? ? ? ? ? ?root (0)
> LENGTH: ? ? ? ? ? 0
>
> [root@smaster ~]#
>
> Regards,
> Zain
>
Hi Zain,

Perhaps try using the ipaddress instead of the hostname?

Also, are clocks synchronized? See
https://slurm.schedmd.com/quickstart_admin.html
Benson



End of slurm-users Digest, Vol 40, Issue 4
******************************************

Reply via email to