date:20181108

Re: [slurm-users] Seff error with Slurm-18.08.1

2018-11-08 Thread Marcus Wagner

Thanks Paddy, just something learned again ;) Best Marcus On 11/08/2018 05:07 PM, Paddy Doyle wrote: Hi all, It looks like we can use the api to avoid having to manually parse the '2=' value from the stats{tres_usage_in_max} value. I've submitted a bug report and patch: https://bugs.schedm

[slurm-users] virtual memory limit exceeded

2018-11-08 Thread Noam Bernstein

Can anyone shed some light on where the _virtual_ memory limit comes from? We're getting jobs killed with the message slurmstepd: error: Step 3664.0 exceeded virtual memory limit (79348101120 > 72638634393), being killed Is this a limit that's dictated by cgroup.conf or by some srun option (like

Re: [slurm-users] bug 2119 with slurm 18.08.2

2018-11-08 Thread Chris Samuel

On Friday, 9 November 2018 5:38:22 AM AEDT Brian Andrus wrote: > Where, slurmctld is not picking up new accounts unless it is restarted. This is usually because slurmdbd cannot connect back to the slurmctld on the management node to do the RPC to tell it that a new account/user/etc has appeared

Re: [slurm-users] bug 2119 with slurm 18.08.2

2018-11-08 Thread Brian Andrus

We use sssd with realmd enumeration is off. Brian Andrus On 11/8/2018 11:26 AM, Marcin Stolarek wrote: I have very similar issue for quite a time and I was unable to find its root cause. Are you using sssd and AD as a data source with only a subtree of entries searched - this is my case. Di

Re: [slurm-users] bug 2119 with slurm 18.08.2

2018-11-08 Thread Marcin Stolarek

I have very similar issue for quite a time and I was unable to find its root cause. Are you using sssd and AD as a data source with only a subtree of entries searched - this is my case. Did you disable users enumeration? It also what I have. I didn’t find ang evidence that it’s related but... may

[slurm-users] bug 2119 with slurm 18.08.2

2018-11-08 Thread Brian Andrus

All, I am seeing what looks like the same issue as https://bugs.schedmd.com/show_bug.cgi?id=2119 Where, slurmctld is not picking up new accounts unless it is restarted. I have 4 clusters (non-federated), all using the same slurmdbd When I added an association for user name=me cluster=DevOps accou

Re: [slurm-users] Seff error with Slurm-18.08.1

2018-11-08 Thread Paddy Doyle

Hi all, It looks like we can use the api to avoid having to manually parse the '2=' value from the stats{tres_usage_in_max} value. I've submitted a bug report and patch: https://bugs.schedmd.com/show_bug.cgi?id=6004 The minimal changes needed would be in the attched seff.patch. Hope that helps

[slurm-users] Slurm missing non primary group memberships

2018-11-08 Thread Aravindh Sampathkumar

Hello all. I'm seeing something strange related to group memberships and how it bothers Slurm. Appreciate any ideas to understand what is going on. It appears that only the primary group of the user is propagated when Slurm runs a job. The additional group memberships vanish. This is not expected

Re: [slurm-users] epilog when job is killed for max time

2018-11-08 Thread Noam Bernstein

Thanks - that's an awesome, yet horrible, hack :) Noam > On Nov 8, 2018, at 3:26 AM, Josep Manel Andrés Moscardó > wrote: > > Hi, > Somebody else gave me this piece of code (I hope he doesn't mind me sharing > it :) , at least it

Re: [slurm-users] Seff error with Slurm-18.08.1

2018-11-08 Thread Marcus Wagner

Hi Miguel, this is because SchedMD changed the stats field. There exists no more rss_max, cmp. line 225 of seff. You need to evaluate the field stats{tres_usage_in_max}, and there the value after '2=', but this is the memory value in bytes instead of kbytes, so this should be divided by 1024

Re: [slurm-users] Seff error with Slurm-18.08.1

2018-11-08 Thread Miguel A . Sánchez

Hi and thanks for all your answers and sorry for the delay in my answer. Yesterday I have installed in the controller machine the Slurm-18.08.3 to check if with this last release the Seff command is working fine. The behavior has improve but I still receive a error message: # /usr/local/slurm-18.

Re: [slurm-users] epilog when job is killed for max time

2018-11-08 Thread Josep Manel Andrés Moscardó

Hi, Somebody else gave me this piece of code (I hope he doesn't mind me sharing it :) , at least it is how they do it: #!/bin/bash #SBATCH --signal=B:USR1@300 #<-- This will make Slurm send signal USR1 to the bash process 300 seconds before the time limit #SBATCH -t 00:06:00 resubmit()

Re: [slurm-users] Seff error with Slurm-18.08.1

[slurm-users] virtual memory limit exceeded

Re: [slurm-users] bug 2119 with slurm 18.08.2

Re: [slurm-users] bug 2119 with slurm 18.08.2

Re: [slurm-users] bug 2119 with slurm 18.08.2

[slurm-users] bug 2119 with slurm 18.08.2

Re: [slurm-users] Seff error with Slurm-18.08.1

[slurm-users] Slurm missing non primary group memberships

Re: [slurm-users] epilog when job is killed for max time

Re: [slurm-users] Seff error with Slurm-18.08.1

Re: [slurm-users] Seff error with Slurm-18.08.1

Re: [slurm-users] epilog when job is killed for max time

12 matches

Site Navigation

Mail list logo

Footer information