On Friday, 11 May 2018 11:15:49 PM AEST Mahmood Naderan wrote:
> Excuse me... I see the output of squeue which says
> 170 IACTIVE bash mahmood PD 0:00 1 (AssocGrpMemLimit)
>
> I don't understand why the memory limit is reach?
That's based on what your job requests, not what is
Hey Prentice,
On Friday, 11 May 2018 6:23:06 AM AEST Prentice Bisbal wrote:
> They would like to have their submission framework automatically
> detect if there's a reservation that may interfere with their jobs, and
> act accordingly.
As an additional data point there is also srun's "--test-onl
On Saturday, 12 May 2018 12:47:29 AM AEST Barry Moore wrote:
> This works perfectly, I appreciate the pointer.
Great to hear! My pleasure.
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
On Saturday, 12 May 2018 3:35:39 PM AEST Mahmood Naderan wrote:
> Although I specified one compute node in an interactive partition, the
> salloc doesn't ssh to that node.
salloc doesn't do that.
We use a 2 line script called "sinteractive" to do this, it's really simple.
#!/bin/bash
exec srun
Hi,
Although I specified one compute node in an interactive partition, the
salloc doesn't ssh to that node. See below
[mahmood@rocks7 ~]$ scontrol show partition IACTIVE
PartitionName=IACTIVE
AllowGroups=ALL AllowAccounts=em1 AllowQos=ALL
AllocNodes=rocks7 Default=NO QoS=N/A
DefaultTime=
Hi Chris,
Thank you for your comments. I will look at Easybuild. There are quite a few
options to automate the creation of software modules.
I will be doing lots of reading this week-end.
By the way, i signed up to the Beowulf mailing list.
Thank you,
Eric
Hi John,
Regarding NFS shares and Python, and plenty of other packages too,
pay attention to where the NFS server is located on your network.
The NFS server should be part of your cluster, or at least have a network
interface on your cluster fabric.
If you perhaps have a home directory server wh
HI Miguel,
Thank you for your comment. That sounds pretty straight forward.
you never had issues with programs relying on the system files or relying on
the home directory location?
Thanks
Eric
_
Hi All,
I'm trying out using GrpTRESRunMins to prevent users from
opportunistically flooding an empty partition with long jobs. We have a
partition set up for each CPU type, and give each association
(account/user/partition) a separate limit based on that account's share of
the partition.
It
Thank you all for your answers, I will research some more along these lines!
Any other opinion is welcome
Regards,
Antonio
El 11/05/18 a las 16:05, Vicker, Darby (JSC-EG311) escribió:
I’ll second that – we have a cluster with 4 generations of nodes. We
assign a processor type feature to e
Chris,
This works perfectly, I appreciate the pointer.
Thanks,
Barry
On Fri, May 11, 2018 at 06:19:40PM +1000, Chris Samuel wrote:
> On Friday, 11 May 2018 4:54:32 AM AEST Barry Moore wrote:
>
> > Is it possible to track all jobs which requested a specific license? I am
> > using Slurm 16.05.6
A feature that many slurm users might like is sbatch --time-min. Using
both --time-min and --time a user can specify the range of acceptable wall
times limits. This can make it much easier to keep jobs running right up
to the maintenance reservation. e.g.:
sbatch --time-min=30:00 --time=48:00:
In the “other ramifications” category, if you aren’t already planning to, you
might consider making this change during a maintenance period when all jobs are
drained. We tried to change from SelectType=select/linear to
SelectType=select/cons_res on the fly once (via “scontrol reconfig”) and
di
In the past we used the LUA job submit plugin to block jobs that would
intersect maintenance reservations. I would look at that.
-Paul Edmon-
On 05/11/2018 08:19 AM, Bill Wichser wrote:
The problem is that reservations can be in there yet have no effect on
the submitted job if they would run
I’ll second that – we have a cluster with 4 generations of nodes. We assign a
processor type feature to each node and require the users to ask for at least
one of those features in their jobs via job_submit.lua – see the code below.
For a job that can run on any processor type, you can use thi
Excuse me... I see the output of squeue which says
170 IACTIVE bash mahmood PD 0:00 1 (AssocGrpMemLimit)
I don't understand why the memory limit is reach? I can not see the
memory usage of a running job from sacct commands. However, using
"top" on the compute node, I see 6 cores
On 10 May 2018, at 19:48, Christopher Benjamin Coffey
wrote:
> We noticed that recently --uid, and --gid functionality changed where
> previously a user in the slurm administrators group could launch jobs
> successfully with --uid, and --gid , allowing for them to submit jobs as
> another use
The problem is that reservations can be in there yet have no effect on
the submitted job if they would run before the reservation takes place.
One can pull the starting time simply using something like this
scontrol show res -o | awk '{print $2}'
with output
StartTime=2018-06-12T06:00:00
Star
Hi
I have added a user to multiple partitions. That account name actually
corresponds to a set of limitations which I define for a user.
[root@rocks7 ~]# sacctmgr list association
format=partition,account,user,grptres,maxwall
PartitionAccount User GrpTRES MaxWall
-- --
You can use node feature in defining the node types in slurm.conf.
Then when requesting for the job, use -C toy just use those
node type.
On Fri, May 11, 2018, 5:38 AM Antonio Lara wrote:
> Hello everyone,
>
> Hopefully someone can help me with this, I cannot find in the manual if
> this is e
Hello everyone,
Hopefully someone can help me with this, I cannot find in the manual if
this is even possible:
I'm a system administrator, and the following question is from the
administrator point of view, not the user's point of view:
I work with a cluster which has a partition containing
Hey Michael!
On Friday, 11 May 2018 1:00:24 AM AEST Michael Jennings wrote:
> I'm surprised to hear that; this is the first time I've ever heard
> that in regards to SLURM. I'd only ever heard folks complain about
> TORQUE having that issue.
Hmm, you might well be right, I might have done that
On Friday, 11 May 2018 4:54:32 AM AEST Barry Moore wrote:
> Is it possible to track all jobs which requested a specific license? I am
> using Slurm 16.05.6. I looked through `sacct ... --format=all`, but maybe I
> am missing something.
I don't think licenses are stored in Slurmdbd by default, I t
On Friday, 11 May 2018 4:48:16 AM AEST Christopher Benjamin Coffey wrote:
> What was the reasoning in making this change? Do people not trust the folks
> in the slurm administrator group to allow this behavior? Seems odd.
The change was here:
https://github.com/SchedMD/slurm/commit/52086a9bc0ff2
On Friday, 11 May 2018 5:11:38 PM AEST John Hearns wrote:
> Eric, my advice would be to definitely learn the Modules system and
> implement modules for your users.
I will echo that, and the suggestion of shared storage (we use our Lustre
filesystem for that). I would also suggest looking at a s
Regarding NFS shares and Python, and plenty of other packages too,
pay attention to where the NFS server is located on your network.
The NFS server should be part of your cluster, or at least have a network
interface on your cluster fabric.
If you perhaps have a home directory server which is a ca
Hi,
I install all my apps in a shared storage, and change environment variables
(path, vars, etc.) with lmod. It's very useful.
Regards.
El vie., 11 may. 2018 a las 6:19, Eric F. Alemany ()
escribió:
> Hi Lachlan,
>
> Thank you for sharing your environment. Everyone has their own set of
> rules
27 matches
Mail list logo