On 24/2/24 06:14, Robert Kudyba via slurm-users wrote:
For now I just set it to chmod 777 on /tmp and that fixed the errors. Is
there a better option?
Traditionally /tmp and /var/tmp have been 1777 (that "1" being the
sticky bit, originally invented to indicate that the OS should attempt
to
On 26/2/24 12:27 am, Josef Dvoracek via slurm-users wrote:
What is the recommended way to run longer interactive job at your systems?
We provide NX for our users and also access via JupyterHub.
We also have high priority QOS's intended for interactive use for rapid
response, but they are cap
On 15/8/24 10:55 am, jpuerto--- via slurm-users wrote:
Any ideas on whether there's a way to mirror this functionality in v0.0.40?
Sorry for not seeing this sooner, I don't I'm afraid!
All the best,
Chris
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an ema
On 22/8/24 11:18 am, jpuerto--- via slurm-users wrote:
Do you have a link to that code? Haven't had any luck finding that repo
It's here (on the 23.11 branch):
https://github.com/SchedMD/slurm/tree/slurm-23.11/src/slurmrestd/plugins/openapi/dbv0.0.38
--
slurm-users mailing list -- slurm-user
On 27/8/24 10:26 am, jpuerto--- via slurm-users wrote:
Is anyone in contact with the development team?
Folks with a support contract can submit bugs at
https://support.schedmd.com/
I feel that this is pretty basic functionality that was removed from the REST API without
warning. Consideri
On 26/8/24 8:40 am, Di Bernardini, Fabio via slurm-users wrote:
Hi everyone, for accounting reasons, I need to create only one job
across two or more federated clusters with two or more srun steps.
The limitations for heterogenous jobs say:
https://slurm.schedmd.com/heterogeneous_jobs.html#li
On 2/2/25 2:46 pm, Steven Jones via slurm-users wrote:
[2025-01-30T19:45:29.024] error: Security violation, ping RPC from uid 12002
Looking at the code that seems to come from this code:
if (!_slurm_authorized_user(msg->auth_uid)) {
error("Security violation, batch lau
On 29/1/25 10:44 am, Steven Jones via slurm-users wrote:
"2025-01-28T21:48:50.271] sched: Allocate JobId=4 NodeList=node4 #CPUs=1
Partition=debug
[2025-01-28T21:48:50.280] Killing non-startable batch JobId=4: Invalid
user id"
Looking at the source code it looks like that second error is repor
On 2/2/25 1:54 pm, Steven Jones via slurm-users wrote:
Thanks for the reply. I already went through this 🙁. I checked all
nodes, id works as does a ssh login.
What is in your slurmd logs on that node?
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email
On 2/2/25 3:46 pm, Steven Jones wrote:
I have never done a HPC before, it is all new to me so I can be making
"newbie errors". The old HPC has been dumped on us so I am trying to
build it "professionally" shall we say ie documented, stable and I will
train ppl to build it (all this with no
On 2/2/25 4:18 pm, Steven Jones via slurm-users wrote:
isn't it slurmd on the compute nodes?
It is, but as this check is (I think) happening on the compute node I
was wanting to check who slurmctld was running as.
The only other thought I have is what is in the compute nodes slurm.conf
as
On 9/12/24 5:44 pm, Steven Jones via slurm-users wrote:
[2024-12-09T23:38:56.645] error: Munge decode failed: Rewound credential
[2024-12-09T23:38:56.645] auth/munge: _print_cred: ENCODED: Tue Dec 10
23:38:30 2024
[2024-12-09T23:38:56.645] auth/munge: _print_cred: DECODED: Mon Dec 09
23:38:56
On 4/4/25 5:23 am, Michael Milton via slurm-users wrote:
Plain srun re-uses the existing Slurm allocation, and specifying
resources like --mem will just request then from the current job rather
than submitting a new one
srun does that as it sees all the various SLURM_* environment variables
Hiya!
On 16/4/25 12:56 am, lyz--- via slurm-users wrote:
I've tried version 23.11.10. It does work.
Oh that's wonderful, so glad it helped! It did seem quite odd that it
wasn't working for you before then. I wonder if this was a cgroups v1 vs
cgroups v2 thing?
All the best,
Chris
--
Chris
On 22/2/25 9:04 pm, Zhang, Yuan via slurm-users wrote:
I got errors about missing perl modules when building slurm24.11.1 rpm
packages. Has anyone seen this error before? And how to fix it?
If my memory serves ne right I would see those same errors when building
Slurn for Cray XC in a chroot
On 23/2/25 9:49 am, Zhang, Yuan via slurm-users wrote:
Thanks for your input. The error I see may not be the same as what you
had on the Cray system, but it shed some lights on the troubleshooting
direction.
My pleasure, I'm so glad that helped point the way!
Best of luck on your endeavours.
16 matches
Mail list logo