Thanks Michael, set -e errexit is the same as setting #!/bin/bash -e as
interpreter as far as I’m aware. As I mention in the original post, I would
like to avoid that. It involves modifying scripts (although to a lesser
extent), and it would end script execution for other runtime errors or non-0
On Thu, 10 Jun 2021 07:20:51 +
Sean Crosby wrote:
> We use sacctmgr list stats for our Slurmdbd check
>
> Our Nagios check is
>
> RESULT=$(/usr/local/slurm/latest/bin/sacctmgr list stats)
> if [ $? -ne 0 ]
> then
> echo "ERROR: cannot connect to database"
> exit 2
> fi
> ech
Il 10/06/2021 11:35, Gestió Servidors ha scritto:
I'm no SLURM expert, but a jobfile like this should work:
#!/bin/bash
#
#SBATCH --job-name=N2n4
#SBATCH --partition=cuda.q
#SBATCH --output=N2n4-CUDA.txt
#SBATCH -N 1 # number of nodes with the first GPU
#SBATCH -n 2 # number of cores
#SBATCH --g
Il 08/06/2021 15:55, Gestió Servidors ha scritto:
Have you tried defining it as heterogeneus job?
https://slurm.schedmd.com/heterogeneous_jobs.html
#SBATCH hetjob
for new SLURM versions or
#SBATCH packjob
for older ones
HIH,
Diego
Hi,
Today, doing some tests, I have not got a solution to
Hello,
No, with "#SBATCH --gres=gpu:2" SLURM searchs a node with 2 GPUs but I need to
run my job in 2 nodes using 2 GPUs but one GPU in each node. If both GPUs are
the same, job runs OK, but I want to test run my job in two nodes: one offers a
GeForceRTX3080 and the second offers a GeForceRTX20
Hi,
I was wondering about the following. If i have a reservation with
accounts associated to it. And i delete the account with sacctmgr i do
not get any message. It just delete the account. But then when you want
to update the reservation (with the deleted account still associated to
it) you
We use sacctmgr list stats for our Slurmdbd check
Our Nagios check is
RESULT=$(/usr/local/slurm/latest/bin/sacctmgr list stats)
if [ $? -ne 0 ]
then
echo "ERROR: cannot connect to database"
exit 2
fi
echo "$RESULT" | head -n 4
exit 0
Sean
From: sl