Running scontrol/sinfo from within a job_submit.lua script seems to be opening a big can of worms --- it might be doable, but it would scare me. Since it sounds like you are only doing such for a fairly limited amount of information which presumably does not change frequently, perhaps it would be better to have a cron job periodically output the desired information to a file, and have the job_submit.lua read the information from the file?
On Tue, Oct 11, 2022 at 5:17 PM Groner, Rob <rug...@psu.edu> wrote: > I am testing a method where, when a job gets submitted asking for specific > features, then, if those features don't exist, I'll do something. > > The job_submit.lua plugin has worked to determine when a job is submitted > asking for the specific features. I'm at the point of checking if those > features exist already (the features are part of a nodeset and part of a > partition....so jobs submitted asking for those features will just go to > pending if no nodes exist that offer those features). I thought to use > "sinfo" to get a list of existing features on the system...but it fails to > run. The same for trying to use scontrol. > > When I submit a job that requests the features, and so the sinfo command > runs, it all hangs for about 10 seconds and then says: > > [me@testsch (RC) slurm] sbatch ./gctest_account_test.sh > sbatch: error: Batch job submission failed: Socket timed out on send/recv > operation > > In the slurmctld.log, I see: > [2022-10-10T17:12:13.933] error: slurm_msg_sendto: address:port= > 10.6.88.99:40100 msg_type=4004: Unexpected missing socket error > > > I'll note that "sinfo -V" works...but I suspect it's because it's not > trying to communicate outside of itself with the slurmctld. > > Any suggestions on what to try? Or is there a better slurm-ic way to do > what I'm trying to do? > > Rob > > > -- Tom Payerle DIT-ACIGS/Mid-Atlantic Crossroads paye...@umd.edu 5825 University Research Park (301) 405-6135 University of Maryland College Park, MD 20740-3831