Re: [slurm-users] Federation problems

2018-12-07 Thread rapier
And nevermind. I didn't restart munge after assuring both clusters had the same key. This specific problem was cleared up after I did that... Chris On 12/7/18 5:14 PM, rapier wrote: Hello all, I'm relatively new to slurm but I've tasked with looking at how slum federation might work. I've

[slurm-users] Federation problems

2018-12-07 Thread rapier
Hello all, I'm relatively new to slurm but I've tasked with looking at how slum federation might work. I've set up two very small slurm clusters. Both of them seem to work individually quite well. However, when I try to set things up to do federation it rapidly breaks down. From what I've be

Re: [slurm-users] Wedged nodes from cgroups, OOM killer, and D state process

2018-12-07 Thread Ryan Novosielski
This is only so relevant, but the scenario presents itself similarly. This is not in a scheduler environment, but we have an interactive server that would have PS hangs on certain tasks (top -bn1 is a way around that, BTW, if it’s hard to even find out what the process is). For us, it appeared t

Re: [slurm-users] Wedged nodes from cgroups, OOM killer, and D state process

2018-12-07 Thread Christopher Benjamin Coffey
Is this parameter applied to each cgroup? Or just the system itself? Seems like just the system itself. — Christopher Coffey High-Performance Computing Northern Arizona University 928-523-1167 On 12/4/18, 10:13 AM, "slurm-users on behalf of Christopher Benjamin Coffey" wrote: Interesti

Re: [slurm-users] How to allocate SMT cores

2018-12-07 Thread Maik Schmidt
I've used "numactl -show" to see which cores I actually got allocated from SLURM, and that's only 0-43 with task/cgroup. If I add task/affinity and use the parameter --hint=multithread, I can get all 0-175 with: -c 176 --hint=multithread. The problem then is, that it *always* allocates the virt

Re: [slurm-users] Use all cores with HT node

2018-12-07 Thread Sidiney Crescencio
I've found out the problem, in my case I had set too much higher value on DefMemperCPU , then when I was requesting 80 cpus for instance, the memory would not be enough. It seems to be working fine now, I'm still testing. Thanks, though. On Fri, 7 Dec 2018 at 15:04, Jeffrey Frey wrote: > I r

Re: [slurm-users] How to allocate SMT cores

2018-12-07 Thread Eli V
On Fri, Dec 7, 2018 at 7:53 AM Maik Schmidt wrote: > > I have found --hint=multithread, but this only works with task/affinity. > We use task/cgroup. Are there any downsides to activating both task > plugins at the same time? > > Best, Maik > > Am 07.12.18 um 13:33 schrieb Maik Schmidt: > > Hi all

Re: [slurm-users] Use all cores with HT node

2018-12-07 Thread Jeffrey Frey
I ran into this myself. By default Slurm allocates HT's as pairs (associated with a single core). The only adequate way I figured out to force HT = core is to make them full-fledged cores in the config: NodeName=csk007 CPUs=80 Boards=1 SocketsPerBoard=2 CoresPerSocket=40 ThreadsPerCore=1 Rea

Re: [slurm-users] How to allocate SMT cores

2018-12-07 Thread Maik Schmidt
I have found --hint=multithread, but this only works with task/affinity. We use task/cgroup. Are there any downsides to activating both task plugins at the same time? Best, Maik Am 07.12.18 um 13:33 schrieb Maik Schmidt: Hi all, we recently got ourselves some Power9 nodes with 4-way SMT. How

[slurm-users] How to allocate SMT cores

2018-12-07 Thread Maik Schmidt
Hi all, we recently got ourselves some Power9 nodes with 4-way SMT. However, other than using --exclusive I cannot seem to find a possibility to make SLURM allocate all SMT threads for me. There simply does not seem to exist a parameter for that. One might think that --threads-per-core would

Re: [slurm-users] Use all cores with HT node

2018-12-07 Thread Marcus Wagner
Hi Sidiney, not tested, but shouldn't SelectTypeParameters    = CR_CPU_MEMORY do the trick? Best Marcus On 12/07/2018 12:12 PM, Sidiney Crescencio wrote: Hello All, I'm facing some issues to use the HT on my compute nodes, I'm running slurm 17.02.7 SelectTypeParameters    = CR_CORE_MEMO

[slurm-users] Use all cores with HT enabled

2018-12-07 Thread Sidiney Crescencio
Hello All, I'm facing some issues to use the HT on my compute nodes, I'm running slurm 17.02.7 SelectTypeParameters= CR_CORE_MEMORY cgroup.conf CgroupAutomount=yes CgroupReleaseAgentDir="/etc/slurm/cgroup" # cpuset subsystem ConstrainCores=yes TaskAffinity=no # memory subsystem ConstrainR

[slurm-users] Use all cores with HT node

2018-12-07 Thread Sidiney Crescencio
Hello All, I'm facing some issues to use the HT on my compute nodes, I'm running slurm 17.02.7 SelectTypeParameters= CR_CORE_MEMORY cgroup.conf CgroupAutomount=yes CgroupReleaseAgentDir="/etc/slurm/cgroup" # cpuset subsystem ConstrainCores=yes TaskAffinity=no # memory subsystem ConstrainR

Re: [slurm-users] possible to set memory slack space before killing jobs?

2018-12-07 Thread Bjørn-Helge Mevik
Raymond Wan writes: > However, a more general question... I thought there is no fool-proof > way to watch the amount of memory a job is using. What if within the > script they ran another program using "nohup", for example. Wouldn't > slurm be unable to include the memory usage of that program?

Re: [slurm-users] possible to set memory slack space before killing jobs?

2018-12-07 Thread Bjørn-Helge Mevik
Eli V writes: > On Wed, Dec 5, 2018 at 5:04 PM Bjørn-Helge Mevik > wrote: >> >> I don't think Slurm has any facility for soft memory limits. >> >> But you could emulate it by simply configure the nodes in slurm.conf >> with, e.g., 15% higher RealMemory value than what is actually available >> o