Hi Robert,
On 2/23/24 17:38, Robert Kudyba via slurm-users wrote:
We switched over from using systemctl for tmp.mount and change to zram,
e.g.,
modprobe zram
echo 20GB > /sys/block/zram0/disksize
mkfs.xfs /dev/zram0
mount -o discard /dev/zram0 /tmp
[...]
> [2024-02-23T20:26:15.881] [530.exter
On 3/3/24 23:04, John Joseph via slurm-users wrote:
Is SWAP a mandatory requirement
All our compute nodes are diskless, so no swap on them.
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an e
On 4/10/24 10:41 pm, archisman.pathak--- via slurm-users wrote:
In our case, that node has been removed from the cluster and cannot be
added back right now ( is being used for some other work ). What can we
do in such a case?
Mark the node as "DOWN" in Slurm, this is what we do when we get job
On 5/4/24 4:24 am, Nuno Teixeira via slurm-users wrote:
Any clues?
> ld: error: unknown emulation: elf_aarch64
All I can think is that your ld doesn't like elf_aarch64, from the log
your posting it looks that's being injected from the FreeBSD ports
system. Looking at the man page for ld on
On 5/6/24 6:38 am, Nuno Teixeira via slurm-users wrote:
Any clues about "elf_aarch64" and "aarch64elf" mismatch?
As I mentioned I think this is coming from the FreeBSD patching that's
being done to the upstream Slurm sources, specifically it looks like
elf_aarch64 is being injected here:
/
On 5/6/24 3:19 pm, Nuno Teixeira via slurm-users wrote:
Fixed with:
[...]
Thanks and sorry for the noise as I really missed this detail :)
So glad it helped! Best of luck with this work.
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
--
slurm-users mailing list -- slu
Hi Jeff!
On 5/15/24 10:35 am, Jeffrey Layton via slurm-users wrote:
I have an Ubuntu 22.04 server where I installed Slurm from the Ubuntu
packages. I now want to install pyxis but it says I need the Slurm
sources. In Ubuntu 22.04, is there a package that has the source code?
How to download t
On 5/22/24 3:33 pm, Brian Andrus via slurm-users wrote:
A simple example is when you have nodes with and without GPUs.
You can build slurmd packages without for those nodes and with for the
ones that have them.
FWIW we have both GPU and non-GPU nodes but we use the same RPMs we
build on both
On 6/17/24 7:24 am, Bjørn-Helge Mevik via slurm-users wrote:
Also, server must be newer than client.
This is the major issue for the OP - the version rule is:
slurmdbd >= slurmctld >= slurmd and clients
and no more than the permitted skew in versions.
Plus, of course, you have to deal with
On 6/21/24 3:50 am, Arnuld via slurm-users wrote:
I have 3500+ GPU cores available. You mean each GPU job requires at
least one CPU? Can't we run a job with just GPU without any CPUs?
No, Slurm has to launch the batch script on compute node cores and it
then has the job of launching the users
G'day Sid,
On 7/31/24 5:02 pm, Sid Young via slurm-users wrote:
I've been waiting for node to become idle before upgrading them however
some jobs take a long time. If I try to remove all the packages I assume
that kills the slurmstep program and with it the job.
Are you looking to do a Slurm
On 8/15/24 7:04 am, jpuerto--- via slurm-users wrote:
I am referring to the REST API. We have had it installed for a few years and have
recently upgraded it so that we can use v0.0.40. But this most recent version is missing
the "get_user_environment" field which existed in previous versions.
On 10/21/24 4:35 am, laddaoui--- via slurm-users wrote:
It seems like there's an issue with the termination process on these nodes. Any
thoughts on what could be causing this?
That usually means processes wedged in the kernel for some reason, in an
uninterruptible sleep state. You can define
On 10/28/24 10:56 am, Bhaskar Chakraborty via slurm-users wrote:
Is there an option in slurm to launch a custom script at the time of job
submission through sbatch
or salloc? The script should run with submit user permission in submit area.
I think you are after the cli_filter functionality w
Hi Ole,
On 10/22/24 11:04 am, Ole Holm Nielsen via slurm-users wrote:
Some time ago it was recommended that UnkillableStepTimeout values above
127 (or 256?) should not be used, see https://support.schedmd.com/
show_bug.cgi?id=11103. I don't know if this restriction is still valid
with recent
On 11/27/24 11:38 am, Kent L. Hanson via slurm-users wrote:
I have restarted the slurmctld and slurmd services several times. I
hashed the slurm.conf files. They are the same. I ran “sinfo -a” as root
with the same result.
Are your nodes in the `FUTURE` state perhaps? What does this show?
si
On 2/3/25 2:33 pm, Steven Jones via slurm-users wrote:
Just built 4 x rocky9 nodes and I do not get that error (but I get
another I know how to fix, I think) so holistically I am thinking the
version difference is too large.
Oh I think I missed this - when you say version difference do you m
On 2/10/25 7:05 am, Michał Kadlof via slurm-users wrote:
I observed similar symptoms when we had issues with the shared Lustre
file system. When the file system couldn't complete an I/O operation,
the process in Slurm remained in the CG state until the file system
became responsive again. An a
On 3/4/25 5:23 pm, Steven Jones via slurm-users wrote:
However mysql -u slurm -p works just fine so it seems to be a config
error for slurmdbd
Try:
mysql -h 127.0.0.1 -u slurm -p
IIRC without that it'll try a UNIX domain socket and not try and connect
via TCP/IP.
--
Chris Samuel : h
Hi Steven,
On 4/9/25 5:00 pm, Steven Jones via slurm-users wrote:
Apr 10 10:28:52 vuwunicohpcdbp1.ods.vuw.ac.nz slurmdbd[2413]: slurmdbd:
fatal: This host not configured to run SlurmDBD ((vuwunicohpcdbp1 or
vuwunicohp>
^^^ that's the critical error message, and it's reporting that because
s
Hiya,
On 4/15/25 7:03 pm, lyz--- via slurm-users wrote:
Hi, Christ. Thank you for continuing paying attention to this issue.
I followed your instuction. And This is the output:
[root@head1 ~]# systemctl cat slurmd | fgrep Delegate
Delegate=yes
That looks good to me, thanks for sharing that!
On 4/15/25 6:57 pm, lyz--- via slurm-users wrote:
Hi, Sean. It's the latest slurm version.
[root@head1 ~]# sinfo --version
slurm 22.05.3
That's quite old (and no longer supported), the oldest still supported
version is 23.11.10 and 24.11.4 came out recently.
What does the cgroup.conf file o
On 4/15/25 12:55 pm, Sean Crosby via slurm-users wrote:
What version of Slurm are you running and what's the contents of your
gres.conf file?
Also what does this say?
systemctl cat slurmd | fgrep Delegate
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
--
slurm-users maili
On 4/14/25 6:27 am, lyz--- via slurm-users wrote:
This command is intended to limit user 'lyz' to using a maximum of 2 GPUs. However, when the user
submits a job using srun, specifying CUDA 0, 1, 2, and 3 in the job script, or
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3", the job still utili
24 matches
Mail list logo