[slurm-users] Re: srun weirdness

2024-05-17 Thread Patryk Bełzak via slurm-users
make sense? > > I also missed that setting in slurm.conf so good to know it is possible to > change the default behaviour. > > Tom > > From: Patryk Bełzak via slurm-users > Date: Friday, 17 May 2024 at 10:15 > To: Dj Merrill > Cc: slurm-users@lists.schedmd.co

[slurm-users] Re: srun weirdness

2024-05-17 Thread greent10--- via slurm-users
slurm.conf so good to know it is possible to change the default behaviour. Tom From: Patryk Bełzak via slurm-users Date: Friday, 17 May 2024 at 10:15 To: Dj Merrill Cc: slurm-users@lists.schedmd.com Subject: [slurm-users] Re: srun weirdness External email to Cardiff University - Take care when

[slurm-users] Re: srun weirdness

2024-05-17 Thread Patryk Bełzak via slurm-users
Hi, I wonder where does this problems come from, perhaps I am missing something, but we never had such issues with limits since we have it set on worker nodes in /etc/security/limits.d/99-cluster.conf: ``` * softmemlock 4086160 #Allow more Memory Locks for MPI * hardmemlock

[slurm-users] Re: srun weirdness

2024-05-15 Thread Dj Merrill via slurm-users
I completely missed that, thank you! -Dj Laura Hild via slurm-users wrote: PropagateResourceLimitsExcept won't do it? Sarlo, Jeffrey S wrote: You might look at the PropagateResourceLimits and PropagateResourceLimitsExcept settings in slurm.conf -- slurm-users mailing list -- slurm-users@l

[slurm-users] Re: srun weirdness

2024-05-15 Thread Laura Hild via slurm-users
PropagateResourceLimitsExcept won't do it? Od: Dj Merrill via slurm-users Poslano: sreda, 15. maj 2024 09:43 Za: slurm-users@lists.schedmd.com Zadeva: [EXTERNAL] [slurm-users] Re: srun weirdness Thank you Hemann and Tom! That was it. The new cl

[slurm-users] Re: srun weirdness

2024-05-15 Thread Dj Merrill via slurm-users
riginal Message- From: Hermann Schwärzler via slurm-users Sent: Wednesday, May 15, 2024 9:45 AM To: slurm-users@lists.schedmd.com Subject: [slurm-users] Re: srun weirdness External email to Cardiff University - Take care when replying/opening attachments or links. Nid ebost mewnol o Brifysgol

[slurm-users] Re: srun weirdness

2024-05-15 Thread greent10--- via slurm-users
: +44 (0)29 208 70734 E-bost: green...@caerdydd.ac.uk Gwefan: http://www.caerdydd.ac.uk/arcca -Original Message- From: Hermann Schwärzler via slurm-users Sent: Wednesday, May 15, 2024 9:45 AM To: slurm-users@lists.schedmd.com Subject: [slurm-users] Re: srun weirdness External ema

[slurm-users] Re: srun weirdness

2024-05-15 Thread Hermann Schwärzler via slurm-users
Hi Dj, could be a memory-limits related problem. What is the output of ulimit -l -m -v -s in both interactive job-shells? You are using cgroups-v1 now, right? In that case what is the respective content of /sys/fs/cgroup/memory/slurm_*/uid_$(id -u)/job_*/memory.limit_in_bytes in both shell

[slurm-users] Re: srun weirdness

2024-05-14 Thread Feng Zhang via slurm-users
Do you have containers setting? On Tue, May 14, 2024 at 3:57 PM Feng Zhang wrote: > > Not sure, very strange, while the two linux-vdso.so.1 looks different: > > [deej@moose66 ~]$ ldd /mnt/local/ollama/ollama > linux-vdso.so.1 (0x7ffde81ee000) > > > [deej@moose66 ~]$ ldd /mnt/local/ollama

[slurm-users] Re: srun weirdness

2024-05-14 Thread Feng Zhang via slurm-users
Not sure, very strange, while the two linux-vdso.so.1 looks different: [deej@moose66 ~]$ ldd /mnt/local/ollama/ollama linux-vdso.so.1 (0x7ffde81ee000) [deej@moose66 ~]$ ldd /mnt/local/ollama/ollama linux-vdso.so.1 (0x7fffa66ff000) Best, Feng On Tue, May 14, 2024 at 3:43 PM D

[slurm-users] Re: srun weirdness

2024-05-14 Thread Dj Merrill via slurm-users
Hi Feng, Thank you for replying. It is the same binary on the same machine that fails. If I ssh to a compute node on the second cluster, it works fine. It fails when running in an interactive shell obtained with srun on that same compute node. I agree that it seems like a runtime environment

[slurm-users] Re: srun weirdness

2024-05-14 Thread Feng Zhang via slurm-users
Looks more like a runtime environment issue. Check the binaries: ldd /mnt/local/ollama/ollama on both clusters and comparing the output may give some hints. Best, Feng On Tue, May 14, 2024 at 2:41 PM Dj Merrill via slurm-users wrote: > > I'm running into a strange issue and I'm hoping anoth