Here is some more data:

Changed slurm.conf to have


SelectType=select/cons_res

SelectTypeParameters=CR_CPU

Then restarted

 sudo systemctl restart slurmctld.service

The log on the host said:


[2017-11-29T12:23:56.384] error: we don't have select plugin type 101

[2017-11-29T12:23:56.384] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:23:56.384] error: Malformed RPC of type REQUEST_ABORT_JOB(6013) 
received

[2017-11-29T12:23:56.384] error: slurm_receive_msg_and_forward: Header lengths 
are longer than data received


Then did a sudo scontrol reconfigure and the log said:


[2017-11-29T12:23:56.394] error: service_connection: slurm_receive_msg: Header 
lengths are longer than data received

[2017-11-29T12:24:34.889] Message aggregation disabled

[2017-11-29T12:24:34.890] Resource spec: Reserved system memory limit not 
configured for this node

Sview had running jobs cleard out of its context (they are still running) But I 
kinda expect that.

I then submitted 6 jobs to the partition that do nothing but sleep and the log 
says:


[2017-11-29T12:25:39.424] error: we don't have select plugin type 101

[2017-11-29T12:25:39.424] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:25:39.424] error: Malformed RPC of type 
REQUEST_BATCH_JOB_LAUNCH(4005) received

[2017-11-29T12:25:39.424] error: slurm_receive_msg_and_forward: Header lengths 
are longer than data received

[2017-11-29T12:25:39.424] error: we don't have select plugin type 101

[2017-11-29T12:25:39.424] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:25:39.424] error: Malformed RPC of type 
REQUEST_BATCH_JOB_LAUNCH(4005) received

[2017-11-29T12:25:39.424] error: slurm_receive_msg_and_forward: Header lengths 
are longer than data received

[2017-11-29T12:25:39.424] error: we don't have select plugin type 101

[2017-11-29T12:25:39.424] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:25:39.424] error: Malformed RPC of type 
REQUEST_BATCH_JOB_LAUNCH(4005) received

[2017-11-29T12:25:39.424] error: slurm_receive_msg_and_forward: Header lengths 
are longer than data received

[2017-11-29T12:25:39.424] error: we don't have select plugin type 101

[2017-11-29T12:25:39.424] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:25:39.424] error: Malformed RPC of type 
REQUEST_BATCH_JOB_LAUNCH(4005) received

[2017-11-29T12:25:39.424] error: slurm_receive_msg_and_forward: Header lengths 
are longer than data received

[2017-11-29T12:25:39.425] error: we don't have select plugin type 101

[2017-11-29T12:25:39.425] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:25:39.425] error: Malformed RPC of type 
REQUEST_BATCH_JOB_LAUNCH(4005) received

[2017-11-29T12:25:39.425] error: slurm_receive_msg_and_forward: Header lengths 
are longer than data received

[2017-11-29T12:25:39.425] error: we don't have select plugin type 101

[2017-11-29T12:25:39.425] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:25:39.425] error: Malformed RPC of type 
REQUEST_BATCH_JOB_LAUNCH(4005) received

[2017-11-29T12:25:39.425] error: slurm_receive_msg_and_forward: Header lengths 
are longer than data received

[2017-11-29T12:25:39.434] error: service_connection: slurm_receive_msg: Header 
lengths are longer than data received

[2017-11-29T12:25:39.434] error: service_connection: slurm_receive_msg: Header 
lengths are longer than data received

[2017-11-29T12:25:39.434] error: service_connection: slurm_receive_msg: Header 
lengths are longer than data received

[2017-11-29T12:25:39.434] error: service_connection: slurm_receive_msg: Header 
lengths are longer than data received

[2017-11-29T12:25:39.435] error: service_connection: slurm_receive_msg: Header 
lengths are longer than data received

[2017-11-29T12:25:39.435] error: service_connection: slurm_receive_msg: Header 
lengths are longer than data received

[2017-11-29T12:25:39.436] error: we don't have select plugin type 101

[2017-11-29T12:25:39.436] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:25:39.436] error: Malformed RPC of type 
REQUEST_TERMINATE_JOB(6011) received

[2017-11-29T12:25:39.436] error: slurm_receive_msg_and_forward: Header lengths 
are longer than data received

[2017-11-29T12:25:39.436] error: we don't have select plugin type 101

[2017-11-29T12:25:39.436] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:25:39.436] error: Malformed RPC of type 
REQUEST_TERMINATE_JOB(6011) received

[2017-11-29T12:25:39.436] error: slurm_receive_msg_and_forward: Header lengths 
are longer than data received

[2017-11-29T12:25:39.436] error: we don't have select plugin type 101

[2017-11-29T12:25:39.436] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:25:39.436] error: Malformed RPC of type 
REQUEST_TERMINATE_JOB(6011) received

[2017-11-29T12:25:39.436] error: slurm_receive_msg_and_forward: Header lengths 
are longer than data received

[2017-11-29T12:25:39.436] error: we don't have select plugin type 101

[2017-11-29T12:25:39.436] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:25:39.436] error: Malformed RPC of type 
REQUEST_TERMINATE_JOB(6011) received

[2017-11-29T12:25:39.436] error: slurm_receive_msg_and_forward: Header lengths 
are longer than data received

[2017-11-29T12:25:39.436] error: we don't have select plugin type 101

[2017-11-29T12:25:39.436] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:25:39.436] error: Malformed RPC of type 
REQUEST_TERMINATE_JOB(6011) received

[2017-11-29T12:25:39.436] error: slurm_receive_msg_and_forward: Header lengths 
are longer than data received

[2017-11-29T12:25:39.437] error: we don't have select plugin type 101

[2017-11-29T12:25:39.437] error: select_g_select_jobinfo_unpack: unpack error

[2017-11-29T12:25:39.437] error: Malformed RPC of type 
REQUEST_TERMINATE_JOB(6011) received

[2017-11-29T12:25:39.437] error: slurm_receive_msg_and_forward: Header lengths 
are longer than data received

[2017-11-29T12:25:39.446] error: service_connection: slurm_receive_msg: Header 
lengths are longer than data received

[2017-11-29T12:25:39.446] error: service_connection: slurm_receive_msg: Header 
lengths are longer than data received

[2017-11-29T12:25:39.446] error: service_connection: slurm_receive_msg: Header 
lengths are longer than data received

[2017-11-29T12:25:39.446] error: service_connection: slurm_receive_msg: Header 
lengths are longer than data received

[2017-11-29T12:25:39.447] error: service_connection: slurm_receive_msg: Header 
lengths are longer than data received

[2017-11-29T12:25:39.447] error: service_connection: slurm_receive_msg: Header 
lengths are longer than data received


Lastly changes the config back to linear and restarted reconfigured and the 
node log says:


[2017-11-29T12:26:19.617] [6684.0] job_manager exiting with aborted job

[2017-11-29T12:26:19.621] [6684.0] done with job

[2017-11-29T12:26:24.591] Message aggregation disabled

[2017-11-29T12:26:24.592] Resource spec: Reserved system memory limit not 
configured for this node



Ethan VanMatre
Informatics Research Analyst
Institute on Development and Disability
Oregon Health & Science University
CSLU - GH40
3181 SW Sam Jackson Park Rd
Portland, OR 97239
(503) 346-3764
vanma...@ohsu.edu

Reply via email to