[slurm-users] Re: slurm-23.11.3-1 with X11 and zram causing permission errors: error: _forkexec_slurmstepd: slurmstepd failed to send return code got 0: Resource temporarily unavailable; Requeue of Jo

2024-02-24 Thread Chris Samuel via slurm-users
On 24/2/24 06:14, Robert Kudyba via slurm-users wrote: For now I just set it to chmod 777 on /tmp and that fixed the errors. Is there a better option? Traditionally /tmp and /var/tmp have been 1777 (that "1" being the sticky bit, originally invented to indicate that the OS should attempt to

[slurm-users] Re: canonical way to run longer shell/bash interactive job (instead of srun inside of screen/tmux at front-end)?

2024-02-27 Thread Chris Samuel via slurm-users
On 26/2/24 12:27 am, Josef Dvoracek via slurm-users wrote: What is the recommended way to run longer interactive job at your systems? We provide NX for our users and also access via JupyterHub. We also have high priority QOS's intended for interactive use for rapid response, but they are cap

[slurm-users] Re: REST API - get_user_environment

2024-08-27 Thread Chris Samuel via slurm-users
On 15/8/24 10:55 am, jpuerto--- via slurm-users wrote: Any ideas on whether there's a way to mirror this functionality in v0.0.40? Sorry for not seeing this sooner, I don't I'm afraid! All the best, Chris -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an ema

[slurm-users] Re: REST API - get_user_environment

2024-08-27 Thread Chris Samuel via slurm-users
On 22/8/24 11:18 am, jpuerto--- via slurm-users wrote: Do you have a link to that code? Haven't had any luck finding that repo It's here (on the 23.11 branch): https://github.com/SchedMD/slurm/tree/slurm-23.11/src/slurmrestd/plugins/openapi/dbv0.0.38 -- slurm-users mailing list -- slurm-user

[slurm-users] Re: REST API - get_user_environment

2024-08-27 Thread Chris Samuel via slurm-users
On 27/8/24 10:26 am, jpuerto--- via slurm-users wrote: Is anyone in contact with the development team? Folks with a support contract can submit bugs at https://support.schedmd.com/ I feel that this is pretty basic functionality that was removed from the REST API without warning. Consideri

[slurm-users] Re: Spread a multistep job across clusters

2024-08-27 Thread Chris Samuel via slurm-users
On 26/8/24 8:40 am, Di Bernardini, Fabio via slurm-users wrote: Hi everyone, for accounting reasons, I need to create only one job across two or more federated clusters with two or more srun steps. The limitations for heterogenous jobs say: https://slurm.schedmd.com/heterogeneous_jobs.html#li

[slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

2025-02-02 Thread Chris Samuel via slurm-users
On 2/2/25 2:46 pm, Steven Jones via slurm-users wrote: [2025-01-30T19:45:29.024] error: Security violation, ping RPC from uid 12002 Looking at the code that seems to come from this code: if (!_slurm_authorized_user(msg->auth_uid)) { error("Security violation, batch lau

[slurm-users] Re: RHEL8.10 V slurmctld

2025-02-02 Thread Chris Samuel via slurm-users
On 29/1/25 10:44 am, Steven Jones via slurm-users wrote: "2025-01-28T21:48:50.271] sched: Allocate JobId=4 NodeList=node4 #CPUs=1 Partition=debug [2025-01-28T21:48:50.280] Killing non-startable batch JobId=4: Invalid user id" Looking at the source code it looks like that second error is repor

[slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

2025-02-02 Thread Chris Samuel via slurm-users
On 2/2/25 1:54 pm, Steven Jones via slurm-users wrote: Thanks for the reply.  I already went through this 🙁.  I checked all nodes, id works as does a ssh login. What is in your slurmd logs on that node? -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email

[slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

2025-02-02 Thread Chris Samuel via slurm-users
On 2/2/25 3:46 pm, Steven Jones wrote: I have never done a HPC before, it is all new to me so I can be making "newbie errors".   The old HPC has been dumped on us so I am trying to build it "professionally" shall we say  ie documented, stable and I will train ppl to build it  (all this with no

[slurm-users] Re: Fw: Re: RHEL8.10 V slurmctld

2025-02-02 Thread Chris Samuel via slurm-users
On 2/2/25 4:18 pm, Steven Jones via slurm-users wrote: isn't it slurmd on the compute nodes? It is, but as this check is (I think) happening on the compute node I was wanting to check who slurmctld was running as. The only other thought I have is what is in the compute nodes slurm.conf as

[slurm-users] Re: node3 not working - down

2024-12-09 Thread Chris Samuel via slurm-users
On 9/12/24 5:44 pm, Steven Jones via slurm-users wrote: [2024-12-09T23:38:56.645] error: Munge decode failed: Rewound credential [2024-12-09T23:38:56.645] auth/munge: _print_cred: ENCODED: Tue Dec 10 23:38:30 2024 [2024-12-09T23:38:56.645] auth/munge: _print_cred: DECODED: Mon Dec 09 23:38:56

[slurm-users] Re: Run a command in Slurm with all streams and signals connected to the submitting command

2025-04-04 Thread Chris Samuel via slurm-users
On 4/4/25 5:23 am, Michael Milton via slurm-users wrote: Plain srun re-uses the existing Slurm allocation, and specifying resources like --mem will just request then from the current job rather than submitting a new one srun does that as it sees all the various SLURM_* environment variables

[slurm-users] Re: [EXT] Re: Issue with Enforcing GPU Usage Limits in Slurm

2025-04-16 Thread Chris Samuel via slurm-users
Hiya! On 16/4/25 12:56 am, lyz--- via slurm-users wrote: I've tried version 23.11.10. It does work. Oh that's wonderful, so glad it helped! It did seem quite odd that it wasn't working for you before then. I wonder if this was a cgroups v1 vs cgroups v2 thing? All the best, Chris -- Chris

[slurm-users] Re: Please help - Building Slurm-24.11.1 Failed

2025-02-23 Thread Chris Samuel via slurm-users
On 22/2/25 9:04 pm, Zhang, Yuan via slurm-users wrote: I got errors about missing perl modules when building slurm24.11.1 rpm packages.  Has anyone seen this error before? And how to fix it? If my memory serves ne right I would see those same errors when building Slurn for Cray XC in a chroot

[slurm-users] Re: Please help - Building Slurm-24.11.1 Failed

2025-02-23 Thread Chris Samuel via slurm-users
On 23/2/25 9:49 am, Zhang, Yuan via slurm-users wrote: Thanks for your input. The error I see may not be the same as what you had on the Cray system, but it shed some lights on the troubleshooting direction. My pleasure, I'm so glad that helped point the way! Best of luck on your endeavours.