Thanks Paddy,
just something learned again ;)
Best
Marcus
On 11/08/2018 05:07 PM, Paddy Doyle wrote:
Hi all,
It looks like we can use the api to avoid having to manually parse the '2='
value from the stats{tres_usage_in_max} value.
I've submitted a bug report and patch:
https://bugs.schedm
Can anyone shed some light on where the _virtual_ memory limit comes from?
We're getting jobs killed with the message
slurmstepd: error: Step 3664.0 exceeded virtual memory limit (79348101120 >
72638634393), being killed
Is this a limit that's dictated by cgroup.conf or by some srun option (like
On Friday, 9 November 2018 5:38:22 AM AEDT Brian Andrus wrote:
> Where, slurmctld is not picking up new accounts unless it is restarted.
This is usually because slurmdbd cannot connect back to the slurmctld on the
management node to do the RPC to tell it that a new account/user/etc has
appeared
We use sssd with realmd
enumeration is off.
Brian Andrus
On 11/8/2018 11:26 AM, Marcin Stolarek wrote:
I have very similar issue for quite a time and I was unable to find
its root cause. Are you using sssd and AD as a data source with only a
subtree of entries searched - this is my case.
Di
I have very similar issue for quite a time and I was unable to find its
root cause. Are you using sssd and AD as a data source with only a subtree
of entries searched - this is my case.
Did you disable users enumeration? It also what I have. I didn’t find ang
evidence that it’s related but... may
All,
I am seeing what looks like the same issue as
https://bugs.schedmd.com/show_bug.cgi?id=2119
Where, slurmctld is not picking up new accounts unless it is restarted.
I have 4 clusters (non-federated), all using the same slurmdbd
When I added an association for user name=me cluster=DevOps
accou
Hi all,
It looks like we can use the api to avoid having to manually parse the '2='
value from the stats{tres_usage_in_max} value.
I've submitted a bug report and patch:
https://bugs.schedmd.com/show_bug.cgi?id=6004
The minimal changes needed would be in the attched seff.patch.
Hope that helps
Hello all.
I'm seeing something strange related to group memberships and how it
bothers Slurm. Appreciate any ideas to understand what is going on.
It appears that only the primary group of the user is propagated when
Slurm runs a job. The additional group memberships vanish. This is not
expected
Thanks - that's an awesome, yet horrible, hack :)
Noam
> On Nov 8, 2018, at 3:26 AM, Josep Manel Andrés Moscardó
> wrote:
>
> Hi,
> Somebody else gave me this piece of code (I hope he doesn't mind me sharing
> it :) , at least it
Hi Miguel,
this is because SchedMD changed the stats field. There exists no more
rss_max, cmp. line 225 of seff.
You need to evaluate the field stats{tres_usage_in_max}, and there the
value after '2=', but this is the memory value in bytes instead of
kbytes, so this should be divided by 1024
Hi and thanks for all your answers and sorry for the delay in my answer.
Yesterday I have installed in the controller machine the Slurm-18.08.3
to check if with this last release the Seff command is working fine. The
behavior has improve but I still receive a error message:
# /usr/local/slurm-18.
Hi,
Somebody else gave me this piece of code (I hope he doesn't mind me
sharing it :) , at least it is how they do it:
#!/bin/bash
#SBATCH --signal=B:USR1@300 #<-- This will make Slurm send signal
USR1 to the bash process 300 seconds before the time limit
#SBATCH -t 00:06:00
resubmit()
12 matches
Mail list logo