[2020-11-18T10:19:35.388] task_p_slurmd_batch_request: 110
[2020-11-18T10:19:35.388] task/affinity: job 110 CPU input mask for
node: 0x00000000000000000000000000000003
[2020-11-18T10:19:35.388] task/affinity: job 110 CPU final HW mask
for node: 0x00000000000000010000000000000001
[2020-11-18T10:19:35.388] debug: [job 110] attempting to run prolog
[/cm/local/apps/cmd/scripts/prolog]
[2020-11-18T10:19:35.392] _run_prolog: run job script took usec=4085
[2020-11-18T10:19:35.392] _run_prolog: prolog with lock for job 110
ran for 0 seconds
[2020-11-18T10:19:35.393] Launching batch job 110 for UID 0
[2020-11-18T10:19:35.398] debug: AcctGatherEnergy NONE plugin loaded
[2020-11-18T10:19:35.398] debug: AcctGatherProfile NONE plugin loaded
[2020-11-18T10:19:35.398] debug: AcctGatherInterconnect NONE plugin
loaded
[2020-11-18T10:19:35.398] debug: AcctGatherFilesystem NONE plugin loaded
[2020-11-18T10:19:35.398] debug: switch NONE plugin loaded
[2020-11-18T10:19:35.398] debug: init: Gres GPU plugin loaded
[2020-11-18T10:19:35.400] [110.batch] Considering each NUMA node as a
socket
[2020-11-18T10:19:35.400] [110.batch] debug: CPUs:128 Boards:1
Sockets:8 CoresPerSocket:8 ThreadsPerCore:2
[2020-11-18T10:19:35.400] [110.batch] debug: Job accounting gather
cgroup plugin loaded
[2020-11-18T10:19:35.401] [110.batch] debug: cont_id hasn't been set
yet not running poll
[2020-11-18T10:19:35.401] [110.batch] debug: laying out the 1 tasks
on 1 hosts martin-cn-01 dist 2
[2020-11-18T10:19:35.401] [110.batch] debug: Message thread started
pid = 267029
[2020-11-18T10:19:35.401] [110.batch] task affinity plugin loaded
with CPU mask
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000ffffffffffffffffffffffffffffffff
[2020-11-18T10:19:35.402] [110.batch] debug: task/cgroup: now
constraining jobs allocated cores
[2020-11-18T10:19:35.402] [110.batch] debug: task/cgroup/memory:
total:515440M allowed:100%(enforced), swap:0%(enforced),
max:100%(515440M) max+swap:100%(1030880M) min:30M kmem:100%(515440M
enforced) min:30M swappiness:100(set)
[2020-11-18T10:19:35.402] [110.batch] debug: task/cgroup: now
constraining jobs allocated memory
[2020-11-18T10:19:35.402] [110.batch] debug: task/cgroup: unable to
open
/cm/shared/apps/slurm/var/etc/flamingo/cgroup_allowed_devices_file.conf:
No such file or directory
[2020-11-18T10:19:35.402] [110.batch] debug: task/cgroup: now
constraining jobs allocated devices
[2020-11-18T10:19:35.402] [110.batch] debug: task/cgroup: loaded
[2020-11-18T10:19:35.402] [110.batch] debug: Checkpoint plugin
loaded: checkpoint/none
[2020-11-18T10:19:35.402] [110.batch] Munge credential signature
plugin loaded
[2020-11-18T10:19:35.402] [110.batch] debug: job_container none
plugin loaded
[2020-11-18T10:19:35.402] [110.batch] debug: xcgroup_instantiate:
cgroup '/sys/fs/cgroup/freezer/slurm' already exists
[2020-11-18T10:19:35.402] [110.batch] debug: xcgroup_instantiate:
cgroup '/sys/fs/cgroup/freezer/slurm/uid_0' already exists
[2020-11-18T10:19:35.403] [110.batch] debug: spank: opening plugin
stack /etc/slurm/plugstack.conf
[2020-11-18T10:19:35.403] [110.batch] debug: xcgroup_instantiate:
cgroup '/sys/fs/cgroup/cpuset/slurm' already exists
[2020-11-18T10:19:35.403] [110.batch] debug: task/cgroup: job
abstract cores are '0'
[2020-11-18T10:19:35.403] [110.batch] debug: task/cgroup: step
abstract cores are '0'
[2020-11-18T10:19:35.403] [110.batch] debug: task/cgroup: job
physical cores are '0,64'
[2020-11-18T10:19:35.403] [110.batch] debug: task/cgroup: step
physical cores are '0,64'
[2020-11-18T10:19:35.403] [110.batch] debug: xcgroup_instantiate:
cgroup '/sys/fs/cgroup/cpuset/slurm/uid_0' already exists
[2020-11-18T10:19:35.406] [110.batch] debug: xcgroup_instantiate:
cgroup '/sys/fs/cgroup/memory/slurm' already exists
[2020-11-18T10:19:35.406] [110.batch] debug: xcgroup_instantiate:
cgroup '/sys/fs/cgroup/memory/slurm/uid_0' already exists
[2020-11-18T10:19:35.406] [110.batch] debug: xcgroup_instantiate:
cgroup '/sys/fs/cgroup/memory/slurm/uid_0/job_110' already exists
[2020-11-18T10:19:35.406] [110.batch] task/cgroup:
/slurm/uid_0/job_110: alloc=4096MB mem.limit=4096MB memsw.limit=4096MB
[2020-11-18T10:19:35.406] [110.batch] debug: xcgroup_instantiate:
cgroup '/sys/fs/cgroup/memory/slurm/uid_0/job_110/step_batch' already
exists
[2020-11-18T10:19:35.406] [110.batch] task/cgroup:
/slurm/uid_0/job_110/step_batch: alloc=4096MB mem.limit=4096MB
memsw.limit=4096MB
[2020-11-18T10:19:35.406] [110.batch] debug: _oom_event_monitor: started.
[2020-11-18T10:19:35.406] [110.batch] debug: xcgroup_instantiate:
cgroup '/sys/fs/cgroup/devices/slurm' already exists
[2020-11-18T10:19:35.406] [110.batch] debug: xcgroup_instantiate:
cgroup '/sys/fs/cgroup/devices/slurm/uid_0' already exists
[2020-11-18T10:19:35.438] [110.batch] debug level = 2
[2020-11-18T10:19:35.438] [110.batch] starting 1 tasks
[2020-11-18T10:19:35.438] [110.batch] task 0 (267034) started
2020-11-18T10:19:35
[2020-11-18T10:19:35.438] [110.batch] debug: Setting slurmstepd
oom_adj to -1000
[2020-11-18T10:19:35.438] [110.batch] debug:
jobacct_gather_cgroup_cpuacct_attach_task: jobid 110 stepid
4294967294 taskid 0 max_task_id 0
[2020-11-18T10:19:35.439] [110.batch] debug: xcgroup_instantiate:
cgroup '/sys/fs/cgroup/cpuacct/slurm' already exists
[2020-11-18T10:19:35.439] [110.batch] debug: xcgroup_instantiate:
cgroup '/sys/fs/cgroup/cpuacct/slurm/uid_0' already exists
[2020-11-18T10:19:35.439] [110.batch] debug: xcgroup_instantiate:
cgroup '/sys/fs/cgroup/cpuacct/slurm/uid_0/job_110' already exists
[2020-11-18T10:19:35.439] [110.batch] debug: xcgroup_instantiate:
cgroup '/sys/fs/cgroup/cpuacct/slurm/uid_0/job_110/step_batch'
already exists
[2020-11-18T10:19:35.439] [110.batch] debug:
jobacct_gather_cgroup_memory_attach_task: jobid 110 stepid 4294967294
taskid 0 max_task_id 0
[2020-11-18T10:19:35.439] [110.batch] debug: xcgroup_instantiate:
cgroup '/sys/fs/cgroup/memory/slurm' already exists
[2020-11-18T10:19:35.439] [110.batch] debug: xcgroup_instantiate:
cgroup '/sys/fs/cgroup/memory/slurm/uid_0' already exists
[2020-11-18T10:19:35.439] [110.batch] debug: xcgroup_instantiate:
cgroup '/sys/fs/cgroup/memory/slurm/uid_0/job_110' already exists
[2020-11-18T10:19:35.439] [110.batch] debug: xcgroup_instantiate:
cgroup '/sys/fs/cgroup/memory/slurm/uid_0/job_110/step_batch' already
exists
[2020-11-18T10:19:35.439] [110.batch] debug: xcgroup_instantiate:
cgroup '/sys/fs/cgroup/memory/slurm/uid_0/job_110/step_batch/task_0'
already exists
[2020-11-18T10:19:35.440] [110.batch] debug: task_p_pre_launch:
affinity jobid 110.4294967294, task:0 bind:256
[2020-11-18T10:19:35.440] [110.batch] task_p_pre_launch: Using
sched_affinity for tasks
[2020-11-18T10:19:39.770] debug: _rpc_terminate_job, uid = 450
[2020-11-18T10:19:39.770] debug: task_p_slurmd_release_resources:
affinity jobid 110
[2020-11-18T10:19:39.770] debug: credential for job 110 revoked
[2020-11-18T10:19:39.771] [110.batch] debug: Handling
REQUEST_SIGNAL_CONTAINER
[2020-11-18T10:19:39.771] [110.batch] debug: _handle_signal_container
for step=110.4294967294 uid=450 signal=18
[2020-11-18T10:19:39.771] [110.batch] Sent signal 18 to 110.4294967294
[2020-11-18T10:19:39.771] [110.batch] debug: Handling
REQUEST_SIGNAL_CONTAINER
[2020-11-18T10:19:39.771] [110.batch] debug: _handle_signal_container
for step=110.4294967294 uid=450 signal=15
[2020-11-18T10:19:39.771] [110.batch] error: *** JOB 110 ON
martin-cn-01 CANCELLED AT 2020-11-18T10:19:39 ***
[2020-11-18T10:19:39.771] [110.batch] Sent signal 15 to 110.4294967294
[2020-11-18T10:19:39.771] [110.batch] debug: Handling REQUEST_STATE
[2020-11-18T10:19:39.792] [110.batch] debug: Handling REQUEST_STATE
[2020-11-18T10:19:39.842] [110.batch] debug: Handling REQUEST_STATE
[2020-11-18T10:19:39.942] [110.batch] debug: Handling REQUEST_STATE
[2020-11-18T10:19:40.442] [110.batch] debug: Handling REQUEST_STATE
[2020-11-18T10:19:41.442] [110.batch] debug: Handling REQUEST_STATE
[2020-11-18T10:19:42.443] [110.batch] debug: Handling REQUEST_STATE
[2020-11-18T10:19:43.443] [110.batch] debug: Handling REQUEST_STATE
[2020-11-18T10:19:44.443] [110.batch] debug: Handling REQUEST_STATE
[2020-11-18T10:19:45.443] [110.batch] debug: Handling REQUEST_STATE
[2020-11-18T10:19:46.443] [110.batch] debug: Handling REQUEST_STATE
[2020-11-18T10:19:47.443] [110.batch] debug: Handling REQUEST_STATE
[2020-11-18T10:19:48.444] [110.batch] debug: Handling REQUEST_STATE
[2020-11-18T10:19:49.444] [110.batch] debug: Handling REQUEST_STATE
[2020-11-18T10:19:50.444] [110.batch] debug: Handling REQUEST_STATE
[2020-11-18T10:20:00.444] [110.batch] debug: Handling REQUEST_STATE
[2020-11-18T10:20:09.445] [110.batch] debug: Handling
REQUEST_STEP_TERMINATE
[2020-11-18T10:20:09.445] [110.batch] debug: _handle_terminate for
step=110.4294967294 uid=0
[2020-11-18T10:20:09.445] [110.batch] Sent SIGKILL signal to
110.4294967294
[2020-11-18T10:20:09.445] [110.batch] debug: Handling REQUEST_STATE
[2020-11-18T10:20:09.465] [110.batch] debug: Handling REQUEST_STATE
[2020-11-18T10:20:09.515] [110.batch] debug: Handling REQUEST_STATE
[2020-11-18T10:20:09.616] [110.batch] debug: Handling REQUEST_STATE
[2020-11-18T10:20:10.116] [110.batch] debug: Handling REQUEST_STATE
[2020-11-18T10:20:11.116] [110.batch] debug: Handling REQUEST_STATE
[2020-11-18T10:20:11.116] [110.batch] debug: Handling
REQUEST_STEP_TERMINATE
[2020-11-18T10:20:11.116] [110.batch] debug: _handle_terminate for
step=110.4294967294 uid=0
[2020-11-18T10:20:11.116] [110.batch] Sent SIGKILL signal to
110.4294967294
[2020-11-18T10:20:12.116] [110.batch] debug: Handling REQUEST_STATE
[2020-11-18T10:20:12.116] [110.batch] debug: Handling
REQUEST_STEP_TERMINATE
[2020-11-18T10:20:12.116] [110.batch] debug: _handle_terminate for
step=110.4294967294 uid=0
[2020-11-18T10:20:59.282] debug: _rpc_terminate_job, uid = 450
[2020-11-18T10:20:59.282] debug: task_p_slurmd_release_resources:
affinity jobid 110
[2020-11-18T10:21:00.121] [110.batch] debug: Handling REQUEST_STATE
[2020-11-18T10:21:00.121] [110.batch] debug: Handling
REQUEST_STEP_TERMINATE
[2020-11-18T10:21:00.121] [110.batch] debug: _handle_terminate for
step=110.4294967294 uid=0
[2020-11-18T10:21:00.121] [110.batch] Sent SIGKILL signal to
110.4294967294
[2020-11-18T10:21:10.000] [110.batch] error: *** JOB 110 STEPD
TERMINATED ON martin-cn-01 AT 2020-11-18T10:21:09 DUE TO JOB NOT
ENDING WITH SIGNALS ***
[2020-11-18T10:21:10.000] [110.batch] debug: get_exit_code task 0
killed by cmd
[2020-11-18T10:21:10.000] [110.batch] job 110 completed with slurm_rc
= 4001, job_rc = -1
[2020-11-18T10:21:10.000] [110.batch] sending
REQUEST_COMPLETE_BATCH_SCRIPT, error:4001 status:-1
[2020-11-18T10:21:10.001] [110.batch] debug: Message thread exited
[2020-11-18T10:21:10.001] [110.batch] done with job
[2020-11-18T10:21:10.121] debug: Waiting for job 110's prolog to
complete
[2020-11-18T10:21:10.121] debug: Finished wait for job 110's prolog
to complete
[2020-11-18T10:21:10.121] debug: [job 110] attempting to run epilog
[/cm/local/apps/cmd/scripts/epilog]
[2020-11-18T10:21:10.124] debug: completed epilog for jobid 110
[2020-11-18T10:21:10.124] debug: Job 110: sent epilog complete msg:
rc = 0