On 5/6/20 11:30 AM, Dustin Lang wrote:
Hi,
Ubuntu has made mysql 5.7.30 the default version. At least with Ubuntu 16.04,
this causes severe problems with Slurm dbd (v 17.x, 18.x, and 19.x; not sure
about 20).
I can confirm that kills slurmdbd on ubuntu 18.04 as well. I had compiled slurm
Anyone know if the new GPU support allows having a different number of GPUs per
node?
I found:
https://www.ch.cam.ac.uk/computing/slurm-usage
Which mentions "SLURM does not support having varying numbers of GPUs per node
in a job yet."
I have a user with a particularly flexible code that would
On 5/15/19 12:34 AM, Barbara Krašovec wrote:
> It could be a problem with ARP cache.
>
> If the number of devices approaches 512, there is a kernel limitation in
> dynamic
> ARP-cache size and it can result in the loss of connectivity between nodes.
We have 162 compute nodes, a dozen or so file
My latest addition to a cluster results in a group of the same nodes
periodically getting listed as
"not-responding" and usually (but not always) recovering.
I increased logging up to debug3 and see messages like:
[2019-05-14T17:09:25.247] debug: Spawning ping agent for
bigmem[1-9],bm[1,7,9-13
On 11/13/18 9:39 PM, Kilian Cavalotti wrote:
> Hi Bill,
> There are a couple mentions of the same backtrace on the bugtracker,
> but that was a long time ago (namely
> https://bugs.schedmd.com/show_bug.cgi?id=1557 and
> https://bugs.schedmd.com/show_bug.cgi?id=1660, for Slurm 14.11). Weird
> to see
After being up since the second week in Oct or so, yesterday our slurm
controller started segfaultings. It was compiled/run on ubuntu 16.04.1.
Nov 12 14:31:48 nas-11-1 kernel: [2838306.311552] srvcn[9111]: segfault at 58 ip
004b51fa sp 7fbe270efb70 error 4 in slurmctld[40+eb000
On 10/16/18 3:38 AM, Bjørn-Helge Mevik wrote:
> Just a tip: Make sure that the kernel has support for constraining swap
> space. I believe we once had to reinstall one of our clusters once
> because we had forgotten to check that.
I tried starting slurmd with -D -v -v -v and got:
slurmd: debug:
Greetings,
I'm using ubuntu-18.04 and slurm-18.08.1 compiled from source.
I followed the directions on:
https://slurm.schedmd.com/cgroups.html
And:
https://slurm.schedmd.com/cgroup.conf.html
That resulted in:
$ cat slurm.conf | egrep -i "cgroup|CR_"
ProctrackType=proctrack/cgroup
TaskPlugin=t
Greetings all,
Just wanted to mention I build building the newest slurm on Ubuntu 18.04.
Gcc-7.3 is the default compiler, which means that the various dependencies
(munge, libevent, hwloc, netloc, pmix, etc) are already available and built with
gcc-7.3.
I carefully built slurm-17.11.6 + openmpi
On 05/08/2018 05:33 PM, Christopher Samuel wrote:
> On 09/05/18 10:23, Bill Broadley wrote:
>
>> It's possible of course that it's entirely an openmpi problem, I'll
>> be investigating and posting there if I can't find a solution.
>
> One of the cha
Greetings all,
I have slurm-17.11.5, pmix-1.2.4, and openmpi-3.0.1 working on several clusters.
I find srun handy for things like:
bill@headnode:~/src/relay$ srun -N 2 -n 2 -t 1 ./relay 1
c7-18 c7-19
size= 1, 16384 hops, 2 nodes in 0.03 sec ( 2.00 us/hop) 1953 KB/sec
Building was st
11 matches
Mail list logo