Hi Victoria,
Am I correct in assuming that the proceedings will no longer be
livestreamed and that there is no option to attend remotely even for pay?
Best,
Sean
On Tue, May 30, 2023 at 1:41 PM Victoria Hobson
wrote:
> It's time for SLUG '23!
>
> Registration for Slurm User Group 2023 (SLUG)
Hi Chip,
Here's the page I've been using for reference:
https://www.dell.com/en-us/dt/servers/server-accelerators.htm
Best,
Sean
On Thu, Oct 27, 2022 at 11:03 AM Chip Seraphine
wrote:
>
> We have a cluster of 1U dells (R640s and R650s) and we’ve been asked to
> install GPUs in them, specifi
ompleted. The OOM came when the job exited and was a false error.
>
>
>
> Also, there are several bug reports open right now about an issue similar
> to what you have described. You can go to bugs.schedmd.com to look at
> those bug reports.
>
>
>
> -Roger
>
>
&g
Hi all,
Has anyone else observed jobs getting OOM-killed in 20.11.8 with cgroups
that ran fine in previous versions like 20.10?
I've had a few reports from users after upgrading maybe six weeks ago that
their jobs are getting OOM-killed when they haven't changed anything and
the job ran to comple
On Mon, Dec 24, 2018 at 12:13 AM Hanby, Mike wrote:
> Howdy,
>
>
>
> We installed a new server to take over the duties of the Slurm master. I
> imported our accounting database into MySQL, copied config files etc..
>
>
>
> Apparently I missed the “file” that contains the last (or is it next)
> JO
Just wanted to follow up. In addition to passing all traffic to the SLURM
controller, opened port 6818/TCP to all other compute nodes and this seems
to have resolved the issue. Thanks again, Matthieu!
Best,
Sean
On Thu, May 17, 2018 at 8:06 PM, Sean Caron wrote:
> Awesome tip. Thanks so m
d recommend to use the first one.
>
> HTH
> Matthieu
>
> PS : you can look at that presentation for a few details on the
> communication logic :
> https://slurm.schedmd.com/SUG14/message_aggregation.pdf
>
>
>
> 2018-05-17 22:21 GMT+02:00 Sean Caron :
>
>>
n
them and there are no issues with the nodes getting to slurm.conf.
Best,
Sean
On Thu, May 17, 2018 at 1:21 PM, Patrick Goetz
wrote:
> Does your SMS have a dedicated interface for node traffic?
>
> On 05/16/2018 04:00 PM, Sean Caron wrote:
>
>> I see some chatter on 6818/TCP
les and look at what traffic is actually
> being blocked?
>
> On Wed, May 16, 2018 at 11:11 AM Sean Caron wrote:
>
>> Hi all,
>>
>> Does anyone use SLURM in a scenario where there is an iptables firewall
>> on the compute nodes on the same network it uses to c
Hi all,
Does anyone use SLURM in a scenario where there is an iptables firewall on
the compute nodes on the same network it uses to communicate with the SLURM
controller and DBD machine?
I have the very basic situation where ...
1. There is no iptables firewall enabled at all on the SLURM contro
est,
Sean
On Fri, Apr 13, 2018 at 12:27 PM, Patrick Goetz
wrote:
> On 04/11/2018 02:35 PM, Sean Caron wrote:
>
>> As a protest to asking questions on this list and getting solicitations
>> for pay-for support, let me give you some advice for free :)
>>
>>
> Now, now. Paid support is how they keep the project going. You like
> using Slurm, right?
>
>
>
>
Hi Matt,
As a protest to asking questions on this list and getting solicitations for
pay-for support, let me give you some advice for free :)
If you look at your slurm.conf you'll see there are two directories that
your slurm user and group need to have write access to.
One is whatever you confi
t; seems to fix the issue.
>
> Otherwise this can happen due to massive traffic to the slurmctld. You
> can try using the defer option for the SchedulerParamters. That slows down
> the scheduler so it can handle the additional load.
>
> -Paul Edmon-
>
>
>
> On 11/8/
Hi all,
I see SLURM 17.02.9 slurmctld hang or become unresponsive every few days
with the message in syslog:
server_thread_count over limit (256), waiting
I believe from the user perspective they see "Socket timed out on send/recv
operation". Slurmctld never seems to recover once it's in this st
14 matches
Mail list logo