Re: [slurm-users] SLUG '23 Registration and Call for Papers

2023-05-30 Thread Sean Caron
Hi Victoria, Am I correct in assuming that the proceedings will no longer be livestreamed and that there is no option to attend remotely even for pay? Best, Sean On Tue, May 30, 2023 at 1:41 PM Victoria Hobson wrote: > It's time for SLUG '23! > > Registration for Slurm User Group 2023 (SLUG)

Re: [slurm-users] Dell <> GPU compatibility matrix?

2022-10-27 Thread Sean Caron
Hi Chip, Here's the page I've been using for reference: https://www.dell.com/en-us/dt/servers/server-accelerators.htm Best, Sean On Thu, Oct 27, 2022 at 11:03 AM Chip Seraphine wrote: > > We have a cluster of 1U dells (R640s and R650s) and we’ve been asked to > install GPUs in them, specifi

Re: [slurm-users] Spurious OOM-kills with cgroups on 20.11.8?

2021-08-10 Thread Sean Caron
ompleted. The OOM came when the job exited and was a false error. > > > > Also, there are several bug reports open right now about an issue similar > to what you have described. You can go to bugs.schedmd.com to look at > those bug reports. > > > > -Roger > > &g

[slurm-users] Spurious OOM-kills with cgroups on 20.11.8?

2021-08-10 Thread Sean Caron
Hi all, Has anyone else observed jobs getting OOM-killed in 20.11.8 with cgroups that ran fine in previous versions like 20.10? I've had a few reports from users after upgrading maybe six weeks ago that their jobs are getting OOM-killed when they haven't changed anything and the job ran to comple

Re: [slurm-users] Restore Last JOBID After Reinstall of Slurm Master Node?

2018-12-23 Thread Sean Caron
On Mon, Dec 24, 2018 at 12:13 AM Hanby, Mike wrote: > Howdy, > > > > We installed a new server to take over the duties of the Slurm master. I > imported our accounting database into MySQL, copied config files etc.. > > > > Apparently I missed the “file” that contains the last (or is it next) > JO

Re: [slurm-users] SLURM nodes flap in "Not responding" status when iptables firewall enabled

2018-05-21 Thread Sean Caron
Just wanted to follow up. In addition to passing all traffic to the SLURM controller, opened port 6818/TCP to all other compute nodes and this seems to have resolved the issue. Thanks again, Matthieu! Best, Sean On Thu, May 17, 2018 at 8:06 PM, Sean Caron wrote: > Awesome tip. Thanks so m

Re: [slurm-users] SLURM nodes flap in "Not responding" status when iptables firewall enabled

2018-05-17 Thread Sean Caron
d recommend to use the first one. > > HTH > Matthieu > > PS : you can look at that presentation for a few details on the > communication logic : > https://slurm.schedmd.com/SUG14/message_aggregation.pdf > > > > 2018-05-17 22:21 GMT+02:00 Sean Caron : > >>

Re: [slurm-users] SLURM nodes flap in "Not responding" status when iptables firewall enabled

2018-05-17 Thread Sean Caron
n them and there are no issues with the nodes getting to slurm.conf. Best, Sean On Thu, May 17, 2018 at 1:21 PM, Patrick Goetz wrote: > Does your SMS have a dedicated interface for node traffic? > > On 05/16/2018 04:00 PM, Sean Caron wrote: > >> I see some chatter on 6818/TCP

Re: [slurm-users] SLURM nodes flap in "Not responding" status when iptables firewall enabled

2018-05-16 Thread Sean Caron
les and look at what traffic is actually > being blocked? > > On Wed, May 16, 2018 at 11:11 AM Sean Caron wrote: > >> Hi all, >> >> Does anyone use SLURM in a scenario where there is an iptables firewall >> on the compute nodes on the same network it uses to c

[slurm-users] SLURM nodes flap in "Not responding" status when iptables firewall enabled

2018-05-16 Thread Sean Caron
Hi all, Does anyone use SLURM in a scenario where there is an iptables firewall on the compute nodes on the same network it uses to communicate with the SLURM controller and DBD machine? I have the very basic situation where ... 1. There is no iptables firewall enabled at all on the SLURM contro

Re: [slurm-users] FSU & Slurm

2018-04-13 Thread Sean Caron
est, Sean On Fri, Apr 13, 2018 at 12:27 PM, Patrick Goetz wrote: > On 04/11/2018 02:35 PM, Sean Caron wrote: > >> As a protest to asking questions on this list and getting solicitations >> for pay-for support, let me give you some advice for free :) >> >> > Now, now. Paid support is how they keep the project going. You like > using Slurm, right? > > > >

Re: [slurm-users] FSU & Slurm

2018-04-11 Thread Sean Caron
Hi Matt, As a protest to asking questions on this list and getting solicitations for pay-for support, let me give you some advice for free :) If you look at your slurm.conf you'll see there are two directories that your slurm user and group need to have write access to. One is whatever you confi

Re: [slurm-users] SLURM 17.02.9 slurmctld unresponsive with server_thread_count over limit, waiting in syslog

2017-11-08 Thread Sean Caron
t; seems to fix the issue. > > Otherwise this can happen due to massive traffic to the slurmctld. You > can try using the defer option for the SchedulerParamters. That slows down > the scheduler so it can handle the additional load. > > -Paul Edmon- > > > > On 11/8/

[slurm-users] SLURM 17.02.9 slurmctld unresponsive with server_thread_count over limit, waiting in syslog

2017-11-08 Thread Sean Caron
Hi all, I see SLURM 17.02.9 slurmctld hang or become unresponsive every few days with the message in syslog: server_thread_count over limit (256), waiting I believe from the user perspective they see "Socket timed out on send/recv operation". Slurmctld never seems to recover once it's in this st