Hello all, A user received an email from Slurm that one of his jobs was preempted. Normally when a job is preempted, the logs will show something like this:
[2023-03-30T08:19:16.535] [25538.batch] error: *** JOB 25538 ON node07 CANCELLED AT 2023-03-30T08:19:16 DUE TO PREEMPTION *** [2023-03-30T08:19:16.573] [25538.1] error: *** STEP 25538.1 ON node07 CANCELLED AT 2023-03-30T08:19:16 DUE TO PREEMPTION *** There was no such entry for this job; what was in the log for the job was this: [2023-04-24T17:06:24.105] [26446.batch] error: *** JOB 26446 ON node07 CANCELLED AT 2023-04-24T17:06:24 *** [2023-04-24T17:06:24.105] [26446.1] error: *** STEP 26446.1 ON node07 CANCELLED AT 2023-04-24T17:06:24 *** [2023-04-24T17:06:24.155] [26446.extern] done with job [2023-04-24T17:06:25.161] [26446.batch] sending REQUEST_COMPLETE_BATCH_SCRIPT, error:0 status:15 [2023-04-24T17:06:25.163] [26446.batch] done with job [2023-04-24T17:06:27.462] [26446.1] error: Failed to send MESSAGE_TASK_EXIT: Connection refused [2023-04-24T17:06:27.464] [26446.1] done with job It's unclear to me whether this was actually preempted, but perhaps there is a different way it logs preemption for MPI jobs. I do not, however, believe that it was preempted, because he was running on a partition to which the account he was using was the only account permitted to use that partition, and in any case, that partition has the highest partition priority. Moreover, the job immediately restarted (after a requeue, with a new job id) on the same partition. Any thoughts as to whether this job was actually preempted, and if not, why the email notification would say it was? Warmest regards, Jason -- *Jason L. Simms, Ph.D., M.P.H.* Manager of Research Computing Swarthmore College Information Technology Services (610) 328-8102 Schedule a meeting: https://calendly.com/jlsimms