* Heckes, Frank <hec...@mps.mpg.de> [210413 12:04]: > This result from a mgmt. - question. How long jobs have to wait (in s, min, > h, day) before they getting executed and > how many jobs are waiting (are queued) for each partition in a certain time > interval. > The first one is easy to find with sacct and submit, start counts + > difference + averaging.
Hi Frank, depending on the definition of "waiting time", the "reserved" field from sacct may be more appropriate than "start" minus "submit". For example for dependency jobs (aka chain jobs) the latter does also count the time a job had to wait for another job to finish whereas "reserved" will only start counting when a job becomes eligible. However, the "eligible" and "reserved" fields in sacct will be set or increased also if a job has hit a resource throttling limit, which may be something you want to factor out of the job waiting time as well. Unfortunaty, I haven't found any metrics in sacct that does only count (or allows to derive) the time a job had to wait just for sufficent resources to become available. Maybe someone else? > The second is a bit cumbersome, so I wonder whether a 'solution' is > already around. The easiest way is to monitor from the beginning and > store the squeue ouput for later evaluation. Unfortunately I didn’t > do that. Not sure if this is a solution for you but I think you can at least resample this retrospectively from sacct by using something like sacct -a -X -S 2021-04-01T00:00:00 -s PD -o JobID,User,Partition This will return job records for all jobs that were in pending state at the specified time. Best regards Jürgen > Cheers, > -Frank > > > The "slurmacct" command prints (possibly for a specified partition) the > > average job waiting time while Pending in the queue, but not the queue > > length > > information. > > > > It may be difficult to answer your question from the Slurm database. The > > sacct > > command displays accounting data for all jobs and job steps, but not > > directly > > for partitions. > > > > There are other Slurm monitoring tools which perhaps can supply the data you > > are looking for. You could ask this list again. > > > > /Ole >