Dear slurm users,
I have some specific jobs that can't be terminated, otherwise they need to
be rerun from the beginning. Can we simply apply some settings (either by
user or administrator) so that these jobs will not be preempted ? Thanks.
with regards,
Peter
Hi all
I have cluster with 6 nodes, 2 GPUs per node, 256 GB of RAM per each node.
I'm interesting of a node status (its name is node05). There is a job on
this node (38 cores, 4GB per core, 152GB total used memory on the node)
When I ask
scontrol show node node05, I get the folloing output:
Hi all,
I have some jobs which write error messages to stderr, and I've noticed
that the stderr output is not being written to file. Here is a simple
reproduction case:
test.sh:
#!/bin/bash
echo "out"
echo "err" >&2
echo "err 2" 1>&2
>&2 echo "err 3"
echo "err 4" >/dev/stderr
echo "err 5" 1>/dev
Do they realize they can chain scripts? Have the script being submitted
to sbatch be something like:
#!/bin/bash
#SBATCH nodes=1
#SBATCH --job-name="Test Job"
echo "do something"
/opt/path/script2.sh
Folks often use template or wrapper scripts to facilitate administrating
a cluster. This may
max_script_size=#
Specify the maximum size of a batch script, in bytes. The default value is 4
megabytes. Larger values may adversely impact system performance.
I have users who've requested to increase this setting, what are some of system
performance issues might arise from changing that value