fyi… Joe is there now staining front entrance & fixing a few minor touchups, nailing baseboard in basement… Lock box is on the house now w/ key in it…
On Jul 26, 2019, at 11:28 AM, Jeffrey Frey <f...@udel.edu<mailto:f...@udel.edu>> wrote: If you check the source code (src/slurmctld/job_mgr.c) this error is indeed thrown when slurmctl unpacks job state files. Tracing through read_slurm_conf() -> load_all_job_state() -> _load_job_state(): part_ptr = find_part_record (partition); if (part_ptr == NULL) { char *err_part = NULL; part_ptr_list = get_part_list(partition, &err_part); if (part_ptr_list) { part_ptr = list_peek(part_ptr_list); if (list_count(part_ptr_list) == 1) FREE_NULL_LIST(part_ptr_list); } else { verbose("Invalid partition (%s) for JobId=%u", err_part, job_id); xfree(err_part); /* not fatal error, partition could have been * removed, reset_job_bitmaps() will clean-up * this job */ } } The comment after the error implies that this is not really a problem, and that it occurs specifically when a partition has been removed. On Jul 26, 2019, at 11:15 AM, Brian Andrus <toomuc...@gmail.com<mailto:toomuc...@gmail.com>> wrote: All, I have a cloud based cluster using slurm 19.05.0-1 I removed one of the partitions, but now everytime I start slurmctld I get some errors: slurmctld[63042]: error: Invalid partition (mpi-h44rs) for JobId=52545 slurmctld[63042]: error: _find_node_record(756): lookup failure for mpi-h44rs-01 slurmctld[63042]: error: node_name2bitmap: invalid node specified mpi-h44rs-01 . . slurmctld[63042]: error: _find_node_record(756): lookup failure for mpi-h44rs-05 slurmctld[63042]: error: node_name2bitmap: invalid node specified mpi-h44rs-05 slurmctld[63042]: error: Invalid nodes (mpi-h44rs-[01-05]) for JobId=52545 I suspect this is in the saved state directory and if I were to down the entire cluster and delete those files up, it would clear it up, but I prefer to not have to down the cluster... Is there a way to clean up "phantom" nodes and partitions that were deleted? Brian Andrus :::::::::::::::::::::::::::::::::::::::::::::::::::::: Jeffrey T. Frey, Ph.D. Systems Programmer V / HPC Management Network & Systems Services / College of Engineering University of Delaware, Newark DE 19716 Office: (302) 831-6034 Mobile: (302) 419-4976 ::::::::::::::::::::::::::::::::::::::::::::::::::::::