Hi All,
I am submitting the job in the cluster by command "sbatch slurm1.sh", but I
get error mentioned below and the process stops in b/w .I have also
attached the slurm1.sh file containing the detail inside.
*** Oops -- I cannot open the LAM help file.
*** I tried looking for it in the follow
Hi,
As Paul mentioned, we once encountered a starvation issue with the
backfill algorithm and since set up the bf_window to match the maximum
running time of all the partitions. This could be the case here.
Also make sure that indeed the jobs can run on the non-gpu nodes (we
constantly encounter