Hello,
I have a queue with 6 servers.
When 4 of the servers are with heavy load, If I send new jobs to the other 2
servers which are free and under different partition and features, The jobs are
still in pending mode (can take them 20 minutes to start running)
If I change their priority with "s
d HPC Team Lead | Research Platform Services
Research Computing | CoEPP | School of Physics
University of Melbourne
On Sun, 14 Jul 2019 at 18:41, Zohar Roe MLM
mailto:rzoh...@iai.co.il>> wrote:
Hello,
I am having two servers in my slurm.conf:
NodeName=serv1 NodeAddr=131.100.100.1 CPUs=4 R
Hello,
I am having two servers in my slurm.conf:
NodeName=serv1 NodeAddr=131.100.100.1 CPUs=4 RealMemory=256000
Features=test,workserv
NodeName=serv2 NodeAddr=131.100.100.2 CPUs=4 RealMemory=256000
Features=test,workserv
When I am sending a job with features "test", The server "serv1" always ge
hat the "hostname" command returns the same name that Slurm
> expects on your compute nodes.
>
> ____________
> From: Zohar Roe Mlm
> Sent: Thursday, October 25, 2018 3:02AM
> To: 'Slurm User Community List'
> Cc:
> Subject: Re: [slu
the server
can't find it (And it happen every two minute, always).
Thanks for your ideas,
Roy.
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of
Lachlan Musicman
Sent: Thursday, October 25, 2018 1:59 AM
To: Slurm User Community List
Subject: Re: [slurm-use
Hello,
I have a node that from some reason change state to "Down" evert few minutes.
When I change it with scontrol to "resume" its ok until Down again.
In the slurm server log I can see error:
"agent/is_node_resp: node:myName1 RPC:REQUEST_PING : Can't find an address,
check slurm.conf"
Now, The
] Question
about sacct
Is accounting setup to use a slurmdbd/database backend or file
(AccountingStorageType)?
3 minutes could make sense if data are being stored in a (large) flat file.
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of
Zohar Roe MLM
Sent: 16 May 2018 07
Hello,
Trying to understand some problems with "sacct " command.
I sent 10 jobs to slurm and I can see them all running in with squeue command.
Now, when I am running "sacct -j 398000" to check one of the jobs, I see two
problems:
1) Its take the sacct command about 3 minutes to return r
econd partition is getting primarily scheduled by the
backfill scheduler. I would try the partition_job_depth option as otherwise
the main loop only looks at priority order and not by partition.
-Paul Edmon-
On 4/29/2018 5:32 AM, Zohar Roe MLM wrote:
> Hello.
> I am having 2 cluster in my
Hello.
I am having 2 cluster in my slurm.conf:
CLUS_WORK1
server1
server2
server3
CLUS_WORK2
pc1
pc2
pc3
When I'm sending 10,000 jobs to CLUS_WORK1 they are good and start running
while a few are in pending state (which is ok).
But if I send new jobs to CLUS_WORK2 which is idle, I see that the j
Hello,
Having another strange problem with slurm 17.02.6.
I have a cluster with 250 cpus.
I am sending a testing job that only sleep for 60 seconds.
A lot of the jobs are taking more than 7 or 8 minute until they finish running
(I can see them in RUNNING mode for more the 7 minutes).
Is there a re
Hello,
Trying again with the slurm.conf This time.
I have a cluster name: Autobot
In this cluster I have servers:
Optimus[1-10] and
Megatron[1-10].
I sent 3000 jobs with feature Optimus and part are running while part are
pendind. Which is ok.
But I have sent 1000 jobs to Megatron and they are a
12 matches
Mail list logo