[slurm-users] When I start slurmctld, there are some errors in log.

2018-06-15 Thread UGI
When I start slurmctld, there are some errors in log. And the job running
information doesn't store to mysql via slurmdbd.

I set

AccountingStoragePass=/usr/local/munge-munge-0.5.13/var/run/munge/munge.socket.2

AccountingStorageType=accounting_storage/slurmdbd

JobAcctGatherType=jobacct_gather/linux

in slurm.conf.


The following message is the log which slurmctld output.

[2018-06-15T11:05:44.763] Terminate signal (SIGINT or SIGTERM) received

[2018-06-15T11:05:44.807] Saving all slurm state

[2018-06-15T11:05:45.101] error: slurmdbd: Sending fini msg: No error

[2018-06-15T11:05:45.126] layouts: all layouts are now unloaded.

[2018-06-15T11:06:07.761] slurmctld version 17.11.7 started on cluster
myslurm

[2018-06-15T11:06:07.785] error: slurm_persist_conn_open_without_init:
failed to open persistent connection to localhost:6819: Connection refused

[2018-06-15T11:06:07.785] error: slurmdbd: Sending PersistInit msg:
Connection refused

[2018-06-15T11:06:07.785] error: slurmdbd: Sending PersistInit msg:
Connection refused

[2018-06-15T11:06:07.787] layouts: no layout to initialize

[2018-06-15T11:06:07.824] error:


[2018-06-15T11:06:07.824] error: ###   SEVERE SECURITY VULERABILTY
  ###

[2018-06-15T11:06:07.824] error: ### StateSaveLocation DIRECTORY IS WORLD
WRITABLE ###

[2018-06-15T11:06:07.824] error: ### CORRECT FILE PERMISSIONS
  ###

[2018-06-15T11:06:07.824] error:


[2018-06-15T11:06:07.824] layouts: loading entities/relations information

[2018-06-15T11:06:07.824] Recovered state of 1 nodes

[2018-06-15T11:06:07.824] Recovered JobID=12 State=0x3 NodeCnt=0 Assoc=2

[2018-06-15T11:06:07.825] Recovered information about 1 jobs

[2018-06-15T11:06:07.825] cons_res: select_p_node_init

[2018-06-15T11:06:07.825] cons_res: preparing for 1 partitions

[2018-06-15T11:06:07.825] Recovered state of 0 reservations

[2018-06-15T11:06:07.825] _preserve_plugins: backup_controller not specified

[2018-06-15T11:06:07.825] cons_res: select_p_reconfigure

[2018-06-15T11:06:07.825] cons_res: select_p_node_init

[2018-06-15T11:06:07.825] cons_res: preparing for 1 partitions

[2018-06-15T11:06:07.825] Running as primary controller

[2018-06-15T11:06:07.825] Registering slurmctld at port 6817 with slurmdbd.

[2018-06-15T11:06:07.825] error: slurmdbd: Sending PersistInit msg:
Connection refused

[2018-06-15T11:06:07.825] error: slurmdbd: Sending PersistInit msg:
Connection refused

[2018-06-15T11:06:07.826] No parameter for mcs plugin, default values set

[2018-06-15T11:06:07.826] mcs: MCSParameters = (null). ondemand set.

[2018-06-15T11:06:10.829]
SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_start=0,sched_min_interval=2


[slurm-users] cluster not registered

2018-06-15 Thread UGI
When I use slurmdbd, it output the following errors.

I have run "sacctmgr add clustr myslurm".

[2018-06-15T17:11:54.685] slurmdbd version 17.11.7 started

[2018-06-15T17:12:05.377] DBD_JOB_COMPLETE: cluster not registered

[2018-06-15T17:12:05.379] DBD_STEP_START: cluster not registered

[2018-06-15T17:12:05.459] DBD_STEP_COMPLETE: cluster not registered

[2018-06-15T17:12:05.460] DBD_JOB_COMPLETE: cluster not registered

[2018-06-15T17:12:05.943] DBD_CLUSTER_TRES: cluster not registered

[2018-06-15T17:12:09.614] DBD_JOB_START: cluster not registered

[2018-06-15T17:16:55.095] DBD_CLUSTER_TRES: cluster not registered


Re: [slurm-users] When I start slurmctld, there are some errors in log.

2018-06-15 Thread UGI
I didn't have the directory /var/spool/slurmctld/.  And then I mkdir the
directory, and "chown slurm:slurm /var/spool/slurmctld".
But there is also the errors.

2018-06-15 16:00 GMT+08:00 John Hearns :

> And your permissions on the directory /var/spool/slurmctld/  are 
>
> On 15 June 2018 at 09:11, UGI  wrote:
>
>> When I start slurmctld, there are some errors in log. And the job running
>> information doesn't store to mysql via slurmdbd.
>>
>> I set
>>
>> AccountingStoragePass=/usr/local/munge-munge-0.5.13/var/run/
>> munge/munge.socket.2
>>
>> AccountingStorageType=accounting_storage/slurmdbd
>>
>> JobAcctGatherType=jobacct_gather/linux
>>
>> in slurm.conf.
>>
>>
>> The following message is the log which slurmctld output.
>>
>> [2018-06-15T11:05:44.763] Terminate signal (SIGINT or SIGTERM) received
>>
>> [2018-06-15T11:05:44.807] Saving all slurm state
>>
>> [2018-06-15T11:05:45.101] error: slurmdbd: Sending fini msg: No error
>>
>> [2018-06-15T11:05:45.126] layouts: all layouts are now unloaded.
>>
>> [2018-06-15T11:06:07.761] slurmctld version 17.11.7 started on cluster
>> myslurm
>>
>> [2018-06-15T11:06:07.785] error: slurm_persist_conn_open_without_init:
>> failed to open persistent connection to localhost:6819: Connection refused
>>
>> [2018-06-15T11:06:07.785] error: slurmdbd: Sending PersistInit msg:
>> Connection refused
>>
>> [2018-06-15T11:06:07.785] error: slurmdbd: Sending PersistInit msg:
>> Connection refused
>>
>> [2018-06-15T11:06:07.787] layouts: no layout to initialize
>>
>> [2018-06-15T11:06:07.824] error: ##
>> ##
>>
>> [2018-06-15T11:06:07.824] error: ###   SEVERE SECURITY VULERABILTY
>>   ###
>>
>> [2018-06-15T11:06:07.824] error: ### StateSaveLocation DIRECTORY IS WORLD
>> WRITABLE ###
>>
>> [2018-06-15T11:06:07.824] error: ### CORRECT FILE PERMISSIONS
>>   ###
>>
>> [2018-06-15T11:06:07.824] error: ##
>> ##
>>
>> [2018-06-15T11:06:07.824] layouts: loading entities/relations information
>>
>> [2018-06-15T11:06:07.824] Recovered state of 1 nodes
>>
>> [2018-06-15T11:06:07.824] Recovered JobID=12 State=0x3 NodeCnt=0 Assoc=2
>>
>> [2018-06-15T11:06:07.825] Recovered information about 1 jobs
>>
>> [2018-06-15T11:06:07.825] cons_res: select_p_node_init
>>
>> [2018-06-15T11:06:07.825] cons_res: preparing for 1 partitions
>>
>> [2018-06-15T11:06:07.825] Recovered state of 0 reservations
>>
>> [2018-06-15T11:06:07.825] _preserve_plugins: backup_controller not
>> specified
>>
>> [2018-06-15T11:06:07.825] cons_res: select_p_reconfigure
>>
>> [2018-06-15T11:06:07.825] cons_res: select_p_node_init
>>
>> [2018-06-15T11:06:07.825] cons_res: preparing for 1 partitions
>>
>> [2018-06-15T11:06:07.825] Running as primary controller
>>
>> [2018-06-15T11:06:07.825] Registering slurmctld at port 6817 with
>> slurmdbd.
>>
>> [2018-06-15T11:06:07.825] error: slurmdbd: Sending PersistInit msg:
>> Connection refused
>>
>> [2018-06-15T11:06:07.825] error: slurmdbd: Sending PersistInit msg:
>> Connection refused
>>
>> [2018-06-15T11:06:07.826] No parameter for mcs plugin, default values set
>>
>> [2018-06-15T11:06:07.826] mcs: MCSParameters = (null). ondemand set.
>>
>> [2018-06-15T11:06:10.829] SchedulerParameters=default_qu
>> eue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_
>> depth=0,sched_max_job_start=0,sched_min_interval=2
>>
>>
>


Re: [slurm-users] When I start slurmctld, there are some errors in log.

2018-06-15 Thread UGI
I have changed the StateSaveLocation. And now the errors gone. It works ok.

2018-06-15 17:42 GMT+08:00 John Hearns :

> Please do three things for the list:
>
> a) cat /etc/*elease*
>
> b) give details on how Slurm was installed on the master node and the
> compute nodes
>
> c) How was your slurm.conf file created? Is this file identical on master
> node and compute nodes?
>
>
>
> On 15 June 2018 at 11:26, UGI  wrote:
>
>> I didn't have the directory /var/spool/slurmctld/.  And then I mkdir the
>> directory, and "chown slurm:slurm /var/spool/slurmctld".
>> But there is also the errors.
>>
>> 2018-06-15 16:00 GMT+08:00 John Hearns :
>>
>>> And your permissions on the directory /var/spool/slurmctld/  are 
>>>
>>> On 15 June 2018 at 09:11, UGI  wrote:
>>>
>>>> When I start slurmctld, there are some errors in log. And the job
>>>> running information doesn't store to mysql via slurmdbd.
>>>>
>>>> I set
>>>>
>>>> AccountingStoragePass=/usr/local/munge-munge-0.5.13/var/run/
>>>> munge/munge.socket.2
>>>>
>>>> AccountingStorageType=accounting_storage/slurmdbd
>>>>
>>>> JobAcctGatherType=jobacct_gather/linux
>>>>
>>>> in slurm.conf.
>>>>
>>>>
>>>> The following message is the log which slurmctld output.
>>>>
>>>> [2018-06-15T11:05:44.763] Terminate signal (SIGINT or SIGTERM) received
>>>>
>>>> [2018-06-15T11:05:44.807] Saving all slurm state
>>>>
>>>> [2018-06-15T11:05:45.101] error: slurmdbd: Sending fini msg: No error
>>>>
>>>> [2018-06-15T11:05:45.126] layouts: all layouts are now unloaded.
>>>>
>>>> [2018-06-15T11:06:07.761] slurmctld version 17.11.7 started on cluster
>>>> myslurm
>>>>
>>>> [2018-06-15T11:06:07.785] error: slurm_persist_conn_open_without_init:
>>>> failed to open persistent connection to localhost:6819: Connection refused
>>>>
>>>> [2018-06-15T11:06:07.785] error: slurmdbd: Sending PersistInit msg:
>>>> Connection refused
>>>>
>>>> [2018-06-15T11:06:07.785] error: slurmdbd: Sending PersistInit msg:
>>>> Connection refused
>>>>
>>>> [2018-06-15T11:06:07.787] layouts: no layout to initialize
>>>>
>>>> [2018-06-15T11:06:07.824] error: ##
>>>> ##
>>>>
>>>> [2018-06-15T11:06:07.824] error: ###   SEVERE SECURITY VULERABILTY
>>>>   ###
>>>>
>>>> [2018-06-15T11:06:07.824] error: ### StateSaveLocation DIRECTORY IS
>>>> WORLD WRITABLE ###
>>>>
>>>> [2018-06-15T11:06:07.824] error: ### CORRECT FILE PERMISSIONS
>>>>   ###
>>>>
>>>> [2018-06-15T11:06:07.824] error: ##
>>>> ##
>>>>
>>>> [2018-06-15T11:06:07.824] layouts: loading entities/relations
>>>> information
>>>>
>>>> [2018-06-15T11:06:07.824] Recovered state of 1 nodes
>>>>
>>>> [2018-06-15T11:06:07.824] Recovered JobID=12 State=0x3 NodeCnt=0 Assoc=2
>>>>
>>>> [2018-06-15T11:06:07.825] Recovered information about 1 jobs
>>>>
>>>> [2018-06-15T11:06:07.825] cons_res: select_p_node_init
>>>>
>>>> [2018-06-15T11:06:07.825] cons_res: preparing for 1 partitions
>>>>
>>>> [2018-06-15T11:06:07.825] Recovered state of 0 reservations
>>>>
>>>> [2018-06-15T11:06:07.825] _preserve_plugins: backup_controller not
>>>> specified
>>>>
>>>> [2018-06-15T11:06:07.825] cons_res: select_p_reconfigure
>>>>
>>>> [2018-06-15T11:06:07.825] cons_res: select_p_node_init
>>>>
>>>> [2018-06-15T11:06:07.825] cons_res: preparing for 1 partitions
>>>>
>>>> [2018-06-15T11:06:07.825] Running as primary controller
>>>>
>>>> [2018-06-15T11:06:07.825] Registering slurmctld at port 6817 with
>>>> slurmdbd.
>>>>
>>>> [2018-06-15T11:06:07.825] error: slurmdbd: Sending PersistInit msg:
>>>> Connection refused
>>>>
>>>> [2018-06-15T11:06:07.825] error: slurmdbd: Sending PersistInit msg:
>>>> Connection refused
>>>>
>>>> [2018-06-15T11:06:07.826] No parameter for mcs plugin, default values
>>>> set
>>>>
>>>> [2018-06-15T11:06:07.826] mcs: MCSParameters = (null). ondemand set.
>>>>
>>>> [2018-06-15T11:06:10.829] SchedulerParameters=default_qu
>>>> eue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_d
>>>> epth=0,sched_max_job_start=0,sched_min_interval=2
>>>>
>>>>
>>>
>>
>


Re: [slurm-users] cluster not registered

2018-06-15 Thread UGI
I have changed the StateSaveLocation. And now the errors gone. It works ok.


2018-06-15 17:21 GMT+08:00 UGI :

>
> When I use slurmdbd, it output the following errors.
>
> I have run "sacctmgr add clustr myslurm".
>
> [2018-06-15T17:11:54.685] slurmdbd version 17.11.7 started
>
> [2018-06-15T17:12:05.377] DBD_JOB_COMPLETE: cluster not registered
>
> [2018-06-15T17:12:05.379] DBD_STEP_START: cluster not registered
>
> [2018-06-15T17:12:05.459] DBD_STEP_COMPLETE: cluster not registered
>
> [2018-06-15T17:12:05.460] DBD_JOB_COMPLETE: cluster not registered
>
> [2018-06-15T17:12:05.943] DBD_CLUSTER_TRES: cluster not registered
>
> [2018-06-15T17:12:09.614] DBD_JOB_START: cluster not registered
>
> [2018-06-15T17:16:55.095] DBD_CLUSTER_TRES: cluster not registered
>
>