Re: [slurm-users] Errors after removing partition

2019-07-27 Thread Brian Andrus
The jobs themselves no longer exist. They had completed before I deleted the partition, which is odd to me. I may have did 'reconfigure' before restarting slurmctld, it was awhile ago, so I don't recall. Brian Andrus On 7/26/2019 8:10 PM, Chris Samuel wrote: On 26/7/19 8:28 am, Jeffrey Fre

Re: [slurm-users] Errors after removing partition

2019-07-26 Thread Chris Samuel
On 26/7/19 8:28 am, Jeffrey Frey wrote: If you check the source code (src/slurmctld/job_mgr.c) this error is indeed thrown when slurmctl unpacks job state files.  Tracing through read_slurm_conf() -> load_all_job_state() -> _load_job_state(): I don't think that's the actual error that Brian i

Re: [slurm-users] Errors after removing partition

2019-07-26 Thread Jodie H. Sprouse
fyi… Joe is there now staining front entrance & fixing a few minor touchups, nailing baseboard in basement… Lock box is on the house now w/ key in it… On Jul 26, 2019, at 11:28 AM, Jeffrey Frey mailto:f...@udel.edu>> wrote: If you check the source code (src/slurmctld/job_mgr.c) this error is i

Re: [slurm-users] Errors after removing partition

2019-07-26 Thread Jeffrey Frey
If you check the source code (src/slurmctld/job_mgr.c) this error is indeed thrown when slurmctl unpacks job state files. Tracing through read_slurm_conf() -> load_all_job_state() -> _load_job_state(): part_ptr = find_part_record (partition); if (part_ptr == NUL

[slurm-users] Errors after removing partition

2019-07-26 Thread Brian Andrus
All, I have a cloud based cluster using slurm 19.05.0-1 I removed one of the partitions, but now everytime I start slurmctld I get some errors: slurmctld[63042]: error: Invalid partition (mpi-h44rs) for JobId=52545 slurmctld[63042]: error: _find_node_record(756): lookup failure for mpi-h44rs-01 s