[slurm-users] When I start slurmctld, there are some errors in log.
When I start slurmctld, there are some errors in log. And the job running information doesn't store to mysql via slurmdbd. I set AccountingStoragePass=/usr/local/munge-munge-0.5.13/var/run/munge/munge.socket.2 AccountingStorageType=accounting_storage/slurmdbd JobAcctGatherType=jobacct_gather/linux in slurm.conf. The following message is the log which slurmctld output. [2018-06-15T11:05:44.763] Terminate signal (SIGINT or SIGTERM) received [2018-06-15T11:05:44.807] Saving all slurm state [2018-06-15T11:05:45.101] error: slurmdbd: Sending fini msg: No error [2018-06-15T11:05:45.126] layouts: all layouts are now unloaded. [2018-06-15T11:06:07.761] slurmctld version 17.11.7 started on cluster myslurm [2018-06-15T11:06:07.785] error: slurm_persist_conn_open_without_init: failed to open persistent connection to localhost:6819: Connection refused [2018-06-15T11:06:07.785] error: slurmdbd: Sending PersistInit msg: Connection refused [2018-06-15T11:06:07.785] error: slurmdbd: Sending PersistInit msg: Connection refused [2018-06-15T11:06:07.787] layouts: no layout to initialize [2018-06-15T11:06:07.824] error: [2018-06-15T11:06:07.824] error: ### SEVERE SECURITY VULERABILTY ### [2018-06-15T11:06:07.824] error: ### StateSaveLocation DIRECTORY IS WORLD WRITABLE ### [2018-06-15T11:06:07.824] error: ### CORRECT FILE PERMISSIONS ### [2018-06-15T11:06:07.824] error: [2018-06-15T11:06:07.824] layouts: loading entities/relations information [2018-06-15T11:06:07.824] Recovered state of 1 nodes [2018-06-15T11:06:07.824] Recovered JobID=12 State=0x3 NodeCnt=0 Assoc=2 [2018-06-15T11:06:07.825] Recovered information about 1 jobs [2018-06-15T11:06:07.825] cons_res: select_p_node_init [2018-06-15T11:06:07.825] cons_res: preparing for 1 partitions [2018-06-15T11:06:07.825] Recovered state of 0 reservations [2018-06-15T11:06:07.825] _preserve_plugins: backup_controller not specified [2018-06-15T11:06:07.825] cons_res: select_p_reconfigure [2018-06-15T11:06:07.825] cons_res: select_p_node_init [2018-06-15T11:06:07.825] cons_res: preparing for 1 partitions [2018-06-15T11:06:07.825] Running as primary controller [2018-06-15T11:06:07.825] Registering slurmctld at port 6817 with slurmdbd. [2018-06-15T11:06:07.825] error: slurmdbd: Sending PersistInit msg: Connection refused [2018-06-15T11:06:07.825] error: slurmdbd: Sending PersistInit msg: Connection refused [2018-06-15T11:06:07.826] No parameter for mcs plugin, default values set [2018-06-15T11:06:07.826] mcs: MCSParameters = (null). ondemand set. [2018-06-15T11:06:10.829] SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_start=0,sched_min_interval=2
[slurm-users] cluster not registered
When I use slurmdbd, it output the following errors. I have run "sacctmgr add clustr myslurm". [2018-06-15T17:11:54.685] slurmdbd version 17.11.7 started [2018-06-15T17:12:05.377] DBD_JOB_COMPLETE: cluster not registered [2018-06-15T17:12:05.379] DBD_STEP_START: cluster not registered [2018-06-15T17:12:05.459] DBD_STEP_COMPLETE: cluster not registered [2018-06-15T17:12:05.460] DBD_JOB_COMPLETE: cluster not registered [2018-06-15T17:12:05.943] DBD_CLUSTER_TRES: cluster not registered [2018-06-15T17:12:09.614] DBD_JOB_START: cluster not registered [2018-06-15T17:16:55.095] DBD_CLUSTER_TRES: cluster not registered
Re: [slurm-users] When I start slurmctld, there are some errors in log.
I didn't have the directory /var/spool/slurmctld/. And then I mkdir the directory, and "chown slurm:slurm /var/spool/slurmctld". But there is also the errors. 2018-06-15 16:00 GMT+08:00 John Hearns : > And your permissions on the directory /var/spool/slurmctld/ are > > On 15 June 2018 at 09:11, UGI wrote: > >> When I start slurmctld, there are some errors in log. And the job running >> information doesn't store to mysql via slurmdbd. >> >> I set >> >> AccountingStoragePass=/usr/local/munge-munge-0.5.13/var/run/ >> munge/munge.socket.2 >> >> AccountingStorageType=accounting_storage/slurmdbd >> >> JobAcctGatherType=jobacct_gather/linux >> >> in slurm.conf. >> >> >> The following message is the log which slurmctld output. >> >> [2018-06-15T11:05:44.763] Terminate signal (SIGINT or SIGTERM) received >> >> [2018-06-15T11:05:44.807] Saving all slurm state >> >> [2018-06-15T11:05:45.101] error: slurmdbd: Sending fini msg: No error >> >> [2018-06-15T11:05:45.126] layouts: all layouts are now unloaded. >> >> [2018-06-15T11:06:07.761] slurmctld version 17.11.7 started on cluster >> myslurm >> >> [2018-06-15T11:06:07.785] error: slurm_persist_conn_open_without_init: >> failed to open persistent connection to localhost:6819: Connection refused >> >> [2018-06-15T11:06:07.785] error: slurmdbd: Sending PersistInit msg: >> Connection refused >> >> [2018-06-15T11:06:07.785] error: slurmdbd: Sending PersistInit msg: >> Connection refused >> >> [2018-06-15T11:06:07.787] layouts: no layout to initialize >> >> [2018-06-15T11:06:07.824] error: ## >> ## >> >> [2018-06-15T11:06:07.824] error: ### SEVERE SECURITY VULERABILTY >> ### >> >> [2018-06-15T11:06:07.824] error: ### StateSaveLocation DIRECTORY IS WORLD >> WRITABLE ### >> >> [2018-06-15T11:06:07.824] error: ### CORRECT FILE PERMISSIONS >> ### >> >> [2018-06-15T11:06:07.824] error: ## >> ## >> >> [2018-06-15T11:06:07.824] layouts: loading entities/relations information >> >> [2018-06-15T11:06:07.824] Recovered state of 1 nodes >> >> [2018-06-15T11:06:07.824] Recovered JobID=12 State=0x3 NodeCnt=0 Assoc=2 >> >> [2018-06-15T11:06:07.825] Recovered information about 1 jobs >> >> [2018-06-15T11:06:07.825] cons_res: select_p_node_init >> >> [2018-06-15T11:06:07.825] cons_res: preparing for 1 partitions >> >> [2018-06-15T11:06:07.825] Recovered state of 0 reservations >> >> [2018-06-15T11:06:07.825] _preserve_plugins: backup_controller not >> specified >> >> [2018-06-15T11:06:07.825] cons_res: select_p_reconfigure >> >> [2018-06-15T11:06:07.825] cons_res: select_p_node_init >> >> [2018-06-15T11:06:07.825] cons_res: preparing for 1 partitions >> >> [2018-06-15T11:06:07.825] Running as primary controller >> >> [2018-06-15T11:06:07.825] Registering slurmctld at port 6817 with >> slurmdbd. >> >> [2018-06-15T11:06:07.825] error: slurmdbd: Sending PersistInit msg: >> Connection refused >> >> [2018-06-15T11:06:07.825] error: slurmdbd: Sending PersistInit msg: >> Connection refused >> >> [2018-06-15T11:06:07.826] No parameter for mcs plugin, default values set >> >> [2018-06-15T11:06:07.826] mcs: MCSParameters = (null). ondemand set. >> >> [2018-06-15T11:06:10.829] SchedulerParameters=default_qu >> eue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_ >> depth=0,sched_max_job_start=0,sched_min_interval=2 >> >> >
Re: [slurm-users] When I start slurmctld, there are some errors in log.
I have changed the StateSaveLocation. And now the errors gone. It works ok. 2018-06-15 17:42 GMT+08:00 John Hearns : > Please do three things for the list: > > a) cat /etc/*elease* > > b) give details on how Slurm was installed on the master node and the > compute nodes > > c) How was your slurm.conf file created? Is this file identical on master > node and compute nodes? > > > > On 15 June 2018 at 11:26, UGI wrote: > >> I didn't have the directory /var/spool/slurmctld/. And then I mkdir the >> directory, and "chown slurm:slurm /var/spool/slurmctld". >> But there is also the errors. >> >> 2018-06-15 16:00 GMT+08:00 John Hearns : >> >>> And your permissions on the directory /var/spool/slurmctld/ are >>> >>> On 15 June 2018 at 09:11, UGI wrote: >>> >>>> When I start slurmctld, there are some errors in log. And the job >>>> running information doesn't store to mysql via slurmdbd. >>>> >>>> I set >>>> >>>> AccountingStoragePass=/usr/local/munge-munge-0.5.13/var/run/ >>>> munge/munge.socket.2 >>>> >>>> AccountingStorageType=accounting_storage/slurmdbd >>>> >>>> JobAcctGatherType=jobacct_gather/linux >>>> >>>> in slurm.conf. >>>> >>>> >>>> The following message is the log which slurmctld output. >>>> >>>> [2018-06-15T11:05:44.763] Terminate signal (SIGINT or SIGTERM) received >>>> >>>> [2018-06-15T11:05:44.807] Saving all slurm state >>>> >>>> [2018-06-15T11:05:45.101] error: slurmdbd: Sending fini msg: No error >>>> >>>> [2018-06-15T11:05:45.126] layouts: all layouts are now unloaded. >>>> >>>> [2018-06-15T11:06:07.761] slurmctld version 17.11.7 started on cluster >>>> myslurm >>>> >>>> [2018-06-15T11:06:07.785] error: slurm_persist_conn_open_without_init: >>>> failed to open persistent connection to localhost:6819: Connection refused >>>> >>>> [2018-06-15T11:06:07.785] error: slurmdbd: Sending PersistInit msg: >>>> Connection refused >>>> >>>> [2018-06-15T11:06:07.785] error: slurmdbd: Sending PersistInit msg: >>>> Connection refused >>>> >>>> [2018-06-15T11:06:07.787] layouts: no layout to initialize >>>> >>>> [2018-06-15T11:06:07.824] error: ## >>>> ## >>>> >>>> [2018-06-15T11:06:07.824] error: ### SEVERE SECURITY VULERABILTY >>>> ### >>>> >>>> [2018-06-15T11:06:07.824] error: ### StateSaveLocation DIRECTORY IS >>>> WORLD WRITABLE ### >>>> >>>> [2018-06-15T11:06:07.824] error: ### CORRECT FILE PERMISSIONS >>>> ### >>>> >>>> [2018-06-15T11:06:07.824] error: ## >>>> ## >>>> >>>> [2018-06-15T11:06:07.824] layouts: loading entities/relations >>>> information >>>> >>>> [2018-06-15T11:06:07.824] Recovered state of 1 nodes >>>> >>>> [2018-06-15T11:06:07.824] Recovered JobID=12 State=0x3 NodeCnt=0 Assoc=2 >>>> >>>> [2018-06-15T11:06:07.825] Recovered information about 1 jobs >>>> >>>> [2018-06-15T11:06:07.825] cons_res: select_p_node_init >>>> >>>> [2018-06-15T11:06:07.825] cons_res: preparing for 1 partitions >>>> >>>> [2018-06-15T11:06:07.825] Recovered state of 0 reservations >>>> >>>> [2018-06-15T11:06:07.825] _preserve_plugins: backup_controller not >>>> specified >>>> >>>> [2018-06-15T11:06:07.825] cons_res: select_p_reconfigure >>>> >>>> [2018-06-15T11:06:07.825] cons_res: select_p_node_init >>>> >>>> [2018-06-15T11:06:07.825] cons_res: preparing for 1 partitions >>>> >>>> [2018-06-15T11:06:07.825] Running as primary controller >>>> >>>> [2018-06-15T11:06:07.825] Registering slurmctld at port 6817 with >>>> slurmdbd. >>>> >>>> [2018-06-15T11:06:07.825] error: slurmdbd: Sending PersistInit msg: >>>> Connection refused >>>> >>>> [2018-06-15T11:06:07.825] error: slurmdbd: Sending PersistInit msg: >>>> Connection refused >>>> >>>> [2018-06-15T11:06:07.826] No parameter for mcs plugin, default values >>>> set >>>> >>>> [2018-06-15T11:06:07.826] mcs: MCSParameters = (null). ondemand set. >>>> >>>> [2018-06-15T11:06:10.829] SchedulerParameters=default_qu >>>> eue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_d >>>> epth=0,sched_max_job_start=0,sched_min_interval=2 >>>> >>>> >>> >> >
Re: [slurm-users] cluster not registered
I have changed the StateSaveLocation. And now the errors gone. It works ok. 2018-06-15 17:21 GMT+08:00 UGI : > > When I use slurmdbd, it output the following errors. > > I have run "sacctmgr add clustr myslurm". > > [2018-06-15T17:11:54.685] slurmdbd version 17.11.7 started > > [2018-06-15T17:12:05.377] DBD_JOB_COMPLETE: cluster not registered > > [2018-06-15T17:12:05.379] DBD_STEP_START: cluster not registered > > [2018-06-15T17:12:05.459] DBD_STEP_COMPLETE: cluster not registered > > [2018-06-15T17:12:05.460] DBD_JOB_COMPLETE: cluster not registered > > [2018-06-15T17:12:05.943] DBD_CLUSTER_TRES: cluster not registered > > [2018-06-15T17:12:09.614] DBD_JOB_START: cluster not registered > > [2018-06-15T17:16:55.095] DBD_CLUSTER_TRES: cluster not registered > >