..."
But, the power saving plugin can already selects different nodes after
the configuration file with predefined nodes?
https://slurm.schedmd.com/elastic_computing.html
Many thanks in advance.
Regards,
Mike
Thanks, Ole, that's perfect.
----
Mike Hanby
mhanby @ uab.edu
Systems Analyst II - Enterprise
IT Research Computing Services
The University of Alabama at Birmingham
On 6/13/18, 4:22 AM, "slurm-users on behalf of Ole Holm Nielsen"
wrote:
On 06/12/2018
Hi everyone,
I'm getting this error lately for everyone's jobs, which results in memory not
being constrained via the cgroups plugin.
slurmstepd: error: task/cgroup: unable to add task[pid=21681] to memory cg
'(null)'
slurmstepd: error: jobacct_gather/cgroup: unable to instanciate user 3691
m
fy during
>an academic semester.
--mike
-Original Message-
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of
Chris Samuel
Sent: Monday, September 10, 2018 6:49 AM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] can't create memory group (cg
allows.
-Original Message-
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of
Mike Cammilleri
Sent: Monday, September 10, 2018 9:49 AM
To: Slurm User Community List
Subject: Re: [slurm-users] can't create memory group (cgroup)
Thanks everyone for
e failed
perl: error: cannot create accounting_storage context for
accounting_storage/slurmdbd
Job not found.
Mike Cammilleri
Systems Administrator
Department of Statistics | UW-Madison
1300 University Ave | Room 1280
608-263-6673 | mi...@stat.wisc.edu
F
uster perhaps the fix will
be available then.
Mike Cammilleri
Systems Administrator
Department of Statistics | UW-Madison
1300 University Ave | Room 1280
608-263-6673 | mi...@stat.wisc.edu
From: slurm-users on behalf of Chris
Samuel
Sent: Tuesday, Novem
master started at 1.
Where is this information stored / can I manually set it so that we aren’t
starting over / battling duplicate JOBIDs in the accounting information?
Thanks, MIke
Mike Hanby
mhanby @ uab.edu
Systems Analyst II - Enterprise
IT Research Computing Services
The
Awesome, thanks all. Looks like FirstJobId is what I need. We know what the
final job id was used for the final job.
Thanks!
Mike Hanby
mhanby @ uab.edu
Systems Analyst II - Enterprise
IT Research Computing Services
The University of Alabama at Birmingham
From: slurm-users
that they submit the jobs as an array and limit with %7,
but maybe there's a more elegant solution using the config.
Any tips appreciated.
Mike Cammilleri
Systems Administrator
Department of Statistics | UW-Madison
1300 University Ave | Room 1280
608-263-6673 | mi...@stat.wisc.edu
lower-weighted node before running on the higher-weighted node. What
we would like is for preemption to only occur when no resources are
available.
Is this possible?
Thanks,
Mike Harvey
Systems Administrator
Bucknell University
har...@bucknell.edu <mailto:har...@bucknell.edu>
2
Unknown option: MaxTRESPerUser=gres/gpu=2
Use keyword 'where' to modify condition
Thanks!
--
Mike Harvey
Systems Administrator
Engineering Computing
Bucknell University
har...@bucknell.edu
ed for the class.
Thoughts? Thanks, Mike
----
Mike Hanby
mhanby @ uab.edu
Systems Analyst III - Enterprise
IT Research Computing Services
The University of Alabama at Birmingham
what we are doing wrong?
Thanks,
Mike
--
*J. Michael Mosley*
University Research Computing
The University of North Carolina at Charlotte
9201 University City Blvd
Charlotte, NC 28223
*704.687.7065 ** jmmos...@uncc.edu *
smime.p7s
Description: S/MIME Cryptographic Signature
o get that
functionality?
If someone could post just a simple slurm.conf file that forces the memory
limits to be honored (and kills the job if they are exceeded), then I could
extract what I need from that.
Again, thanks for the assistance.
Mike
On Thu, Oct 24, 2019 at 11:27 PM mercan
wrote:
enforcement mechanism.
Have you gotten this to work with 19.0.5?
Thanks Mike
On Fri, Oct 25, 2019 at 9:41 AM Mark Hahn wrote:
> > need. I simply want to enforce the memory limits as specified by the
> user
> > at job submission time. This seems to have been the behavior in
>
/ documentation.
Again, thank your time. It was very helpful.
Mike
On Fri, Oct 25, 2019 at 11:26 AM Juergen Salk
wrote:
> Hi Mike,
>
> IIRC, I once did some tests with the very same configuration as
> your's, i.e. `JobAcctGatherType=jobacct_gather/linux´ and
> `JobAcctGatherP
:26 AM Juergen Salk
wrote:
> Hi Mike,
>
> IIRC, I once did some tests with the very same configuration as
> your's, i.e. `JobAcctGatherType=jobacct_gather/linux´ and
> `JobAcctGatherParams=OverMemoryKill´ and got this to work as expected:
> Jobs were killed when they exceed
vance.
Mike
--
*J. Michael Mosley*
University Research Computing
The University of North Carolina at Charlotte
9201 University City Blvd
Charlotte, NC 28223
*704.687.7065 ** jmmos...@uncc.edu *
smime.p7s
Description: S/MIME Cryptographic Signature
I am running a slurm client on a virtual machine. The virtual machine
originally had a core count of 10. But I have now increased the cores to
16, but "slurmd -C" continues to show 10. I have increased the core count
in the slurm.conf file. and that is being seen correctly. The state of the
nod
I am running a slurm client on a virtual machine. The virtual machine
originally had a core count of 10. But I have now increased the cores to
16, but "slurmd -C" continues to show 10. I have increased the core count
in the slurm.conf file. and that is being seen correctly. The state of the
nod
t too
continues to show the lower original core count.
Specifically, how is slurmd -C getting that info? Maybe this is a kernel
issue, but other than lscpu and /proc/cpuinfo, I don't know where to look.
Maybe I should be looking at the slurmd source?
-Mike
*Michael Tie*Technica
Street phn: 507-222-4067
Northfield, MN 55057 cel:952-212-8933
m...@carleton.edufax:507-222-4312
On Tue, Mar 10, 2020 at 12:21 AM Chris Samuel wrote:
> On 9/3/20 7:44 am, mike tie wrote:
>
> > Specifically, how is slu
compute node on it, and resize it appropriately every few
months (why not put it to work). I started with 10 cores, and it looks
like I can up it to 16 cores for a while, and that's when I ran into the
problem.
-mike
*Michael Tie*Technical Director
Mathematics, Statistics, and Comput
bogged down doing backfill processing.
Is there any way to limit the maximum number of jobs a single user can have in
the queue at any given time?
Mike Hanby
mhanby @ uab.edu
Systems Analyst III - Enterprise
IT Research Computing Services
The University of Alabama at Birmingham
rsWatts=0 ExtSensorsTemp=n/s
Reason=Reboot ASAP [root@2020-08-06T10:29:22]
Any thoughts as to how to cancel the reboot?
Mike Hanby
mhanby @ uab.edu
Systems Analyst III - Enterprise
IT Research Computing Services
The University of Alabama at Birmingham
ing reboot"
scontrol cancel_reboot c01
From: "Hanby, Mike"
Date: Friday, August 7, 2020 at 11:43 AM
To: Slurm User Community List
Subject: Cancel "reboot ASAP" for a node
Howdy, (Slurm 18.08)
We have a bunch of node that we've updated to "scontrol reboot A
same script for an older setup and it work fine. Any
suggestion if there is any workaround to define $SCRACTH beside getting user to
define it in their submit script ?
Prolog script
#!/bin/sh
export SCRATCH=/scratch/$SLURM_JOB_USER/$SLURM_JOB_ID
mkdir --parents $SCRATCH
Thanks
Mike
m
2. UnkillableStepProgram can be use to send email or reboot compute node -
question is how do we configure it ?
scontrol show config | grep -i kill
KillOnBadExit = 1
KillWait= 30 sec
UnkillableStepProgram = (null)
UnkillableStepTimeout = 300 sec
Please advise
Thanks
Mike
Hi Jürgen
Thanks for the help 😊 TaskProlog did get the job done
Regards
Mike
-Original Message-
From: slurm-users On Behalf Of Juergen
Salk
Sent: Tuesday, 23 March 2021 8:39 PM
To: Slurm User Community List
Subject: Re: [slurm-users] Slurm prolog export variable
Hi Mike,
for
Hi Chris
Thanks for the clarification
Mike
-Original Message-
From: slurm-users On Behalf Of Chris
Samuel
Sent: Tuesday, 23 March 2021 5:30 PM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] Slurm - UnkillableStepProgram
Hi Mike,
On 22/3/21 7:12 pm, Yap, Mike wrote
o
a value larger than 126 seconds:
https://bugs.schedmd.com/show_bug.cgi?id=11103
From: slurm-users
mailto:slurm-users-boun...@lists.schedmd.com>>
On Behalf Of Yap, Mike
Sent: Monday, March 22, 2021 7:13 PM
To: slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>
Subjec
requested
instead of the default cpu ?
Many thanks
Mike
out the value
for GrpTRESMins for your "Association Records":
scontrol show assoc_mgr
Hope that helps!
Luke
From: slurm-users
mailto:slurm-users-boun...@lists.schedmd.com>>
On Behalf Of Yap, Mike
Sent: Wednesday, March 31, 2021 4:50 PM
To: slurm-us...@schedmd.com<mailto:slurm
Fix the issue with TRESBillingWeights,
It seems like I will need to set PartitionName for it to work
https://bugs.schedmd.com/show_bug.cgi?id=3753
PartitionName=DEFAULT TRESBillingWeights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0"
From: slurm-users On Behalf Of Yap, Mike
Sent: Wednesday, 7 Ap
want to make predominant.
From: slurm-users On Behalf Of Yap, Mike
Sent: Wednesday, 7 April 2021 9:57 AM
To: Slurm User Community List
Subject: Re: [slurm-users] Fairshare +FairTree Algorithm + TRESBillingWeights
Thanks Luke.. Will go through the 2 commands (will try to digest them)
Wondering
jobs but assigned between 1a and 1b, the job
will run on 1b node but no activities on 1a
Any suggestion
Thanks
Mike
pt'
#
My configure was
./configure --prefix=/opt/ohpc/pub/slurm/slurm-20.11.7
--with-pmix=/opt/ohpc/pub/slurm/pmix-4.0.0 --sysconfdir=/etc/slurm
but even if I add "--with-pam", I get the same result with make.
Can someone point me to what I'm missing?
---
Mike VanHorn
Senio
I was missing pam-devel.
Thank you!
---
Mike VanHorn
Senior Computer Systems Administrator
College of Engineering and Computer Science
Wright State University
265 Russ Engineering Center
937-775-5157
michael.vanh...@wright.edu
On 7/14/21, 6:44 AM, "slurm-users on behalf of Ole Holm Ni
t normally take ?
Thanks for advise
Mike
https://slurm.schedmd.com/acct_gather.conf.html
s On Behalf Of Paul
Brunk
Sent: Tuesday, 23 November 2021 4:57 AM
To: Slurm User Community List
Subject: Re: [slurm-users] AcctGatherProfileType
Hi Mike:
Your slurm must be built with hdf5 support (if 'configure' could find
libhwloc-devel files, it should have been automatically)
ULoad=6.08
AvailableFeatures=hi_mem,data,scratch
ActiveFeatures=hi_mem,data,scratch
Thanks for any insight,
Mike
mctld Restart
In slurm.conf, we just add the Features to the node description. Is that what
you were looking for?
NodeName=compute-4-4 … Weight=15 Feature=gen10
Jeff
UH IT - HPC
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of
Hanby, Mike
Sent: Thursday, June 2, 2
ht be a neater way to
do this?
Regards,
Mike
actly what I'm looking for. The values outside the brackets are the
qos limit, and the values within are the current usage.
Regards,
Mike
-Original Message-
From: slurm-users On Behalf Of Ole Holm
Nielsen
Sent: 28 November 2022 18:58
To: slurm-users@lists.schedmd.com
Subject: [Exter
from below it
was only a few minutes work to write a small script to extract the info I
needed.
Regards,
Mike
-Original Message-
From: slurm-users On Behalf Of Ole Holm
Nielsen
Sent: 29 November 2022 15:25
To: Slurm User Community List
Subject: Re: [slurm-users] [External] Re: Per
upgrades, ease of reproducing the
cluster in development, etc…
How about it, anyone running containerized Slurm server processes in production?
Thanks, Mike
-03-15T01:21:21 juju-65df3d-2
Mike
From: slurm-users on behalf of Hanby,
Mike
Date: Wednesday, February 15, 2023 at 1:51 PM
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] Running Containerized Slurmctld and Slurmdb in
Production?
Howdy,
Just wondering if any sites are running contain
(max or sum of individual traceable resources) billing unit. Note by
default this values equals to the number of CPUs used.
Thanks,
-Mike
USA
Sent from my iPhone
> On Mar 25, 2023, at 11:21 AM, Thomas Arildsen wrote:
>
> I am experimenting with getting information from a Slurm clust
the file in the
source tree and the file that got installed to /usr/bin/seff
$ diff contribs/seff/seff /usr/bin/seff
11c11
< use lib "${FindBin::Bin}/../lib/perl";
---
> use lib qw(/usr/lib64/perl5);
Mike Robbert
Cyberinfrastructure Specialist, Cyberinfrastructure and A
why.
Is there something I need to change in the slurm.conf to allow this to work?
---
Mike VanHorn
Senior Computer Systems Administrator
College of Engineering and Computer Science
Wright State University
265 Russ Engineering Center
937-775-5157
michael.vanh...@wright.edu
They should not affect the task performance.
May be the cluster configuration allocated slow machines for salloc.
salloc and sbatch have different purposes:
salloc is used to allocate a set of resources to a job. Once the resources have
been allocated, the user can run a command or script on t
performance? I have submitted the task with the help of "srun {{ name_of_task }} --pty bash", and the result is the same as for launching with salloc. Thanks in advance!вт, 4 июл. 2023 г. в 15:51, Mike Mikailov <mmikai...@gmail.com>:They should not affect the task performance.Ma
t 9:11 AM, Loris Bennett wrote:
>
> Mike Mikailov writes:
>
>> They should not affect the task performance.
>>
>> May be the cluster configuration allocated slow machines for salloc.
>>
>> salloc and sbatch have different purposes:
>>
>> *
! And if slurm workers are identical, what can be the reason? Can interactive mode affect the performance? I have submitted the task with the help of "srun {{ name_of_task }} --pty bash", and the result is the same as for launching with salloc. Thanks in advance!вт, 4 июл. 2023 г. в 1
reasonable time not indefinite in Slurm.
Sent from my iPhone
> On Jul 5, 2023, at 1:43 AM, Loris Bennett wrote:
>
> Mike Mikailov writes:
>
>> About the last point. In the case of sbatch the jobs wait in the queue as
>> long as it takes until the resources are a
ote:
>
> Mike Mikailov writes:
>
>> Thank you Loris, for the further clarifications. The only question is
>> who will wait forever in interactive mode? And how practical is it?
>>
>> Interactive mode is what its name implies - interactive, not queueing.
>
>
nodes which are also used for non-interactive jobs.I've just confirmed that LSF and PBS documentation use the term "interactive" in the same way as Slurm.-PaulOn Wed, Jul 5, 2023 at 7:06 AM Mike Mikailov <mmikai...@gmail.com> wrote:Thank you Loris, for the further feedback.
“
function 'cgroup_dbus_attach_to_scope':
cgroup_dbus.c:350:29: error: 'DBUS_MESSAGE_ITER_INIT_CLOSED' undeclared (first
use in this function); did you mean 'DBUS_MESSAGE_TYPE_INVALID'?
DBusMessageIter args_itr = DBUS_MESSAGE_ITER_INIT_CLOSED;
^~~
functional equivalent of this in SLURM:
* I can set the whole node to Drain
* I can set the whole partition to Inactive
Is there some way to 'disable' partition y just on node1?
Regards,
Mike
different from the static I guess best
practice is to ensure slurm.conf's partition definitions also need to be edited?
Regards,
Mike
-Original Message-
From: slurm-users On Behalf Of Feng
Zhang
Sent: Friday, August 4, 2023 7:36 PM
To: Slurm User Community List
Subject: [External]
Hello,
Can someone please tell me which version of Slurm introduced the
ReservedCoresPerGPU parameter?
Thanks,
-Mike
Thank you for the quick reply Ryan.
I heard about ReservedCoresPerGPU at the recent SuperComputing conference.
Do you mean ReservedCoresPerGPU is not available yet?
Thanks,
- Mike
> On Nov 27, 2023, at 5:34 PM, Ryan Novosielski wrote:
>
> Looks like 24.08 to me, so s/i
Thank you, Ryan.
I also found their roadmap, it states the same, August, 2024.
Thank you, again.
Sent from my iPhone
> On Nov 27, 2023, at 6:29 PM, Ryan Novosielski wrote:
>
> It does appear that way. Slurm versions are YY.MM.
>
>>> On Nov 27, 2023, at 17:43,
we're aiming to use fairshare even on the no-time-limits
partitions to help balance out usage).
Hoping someone can provide pointers.
Regards,
Mike
on a pending job by running “qstat -j
jobid”. But there doesn’t seem to be any functional equivalent with SLURM?
Regards,
Mike
From: slurm-users On Behalf Of Davide
DelVento
Sent: Monday, December 11, 2023 4:23 PM
To: Slurm User Community List
Subject: [External] Re: [slurm-users
b from grabbing more cores than assigned at submit time. Is there something
else I should be configuring to safeguard against this behavior? If SLURM
assigns 1 cpu to the task then no matter what craziness is in the code, 1 is
all they're getting. Possible?
Thanks for any insight!
--mike
..@lists.schedmd.com] On Behalf Of
Chris Samuel
Sent: Friday, December 8, 2017 6:46 PM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] detectCores() mess
On 9/12/17 4:54 am, Mike Cammilleri wrote:
> I thought cgroups (which we are using) would prevent some of this
> beh
s it to do otherwise. In this case, the users is using
R-3.4.3/bin/Rscript
Thanks!
mike
tasks.
-Original Message-
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of
Mike Cammilleri
Sent: Tuesday, February 13, 2018 10:31 AM
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] Are these threads actually unused?
I posted a question similar to this
to run, even if it would push beyond
the AssocGrpCPURunMinutesLimit value? This would be on a case-by-case
basis, and a solution that requires administrative interaction is perfectly
fine.
Thanks for any help you can provide.
--
Mike Renfro / HPC Systems Administrator, Information Technology
00:19:01 core-walltime
Memory Utilization 7.53 MB
Memory Efficiency: 75.3% of 10 MB
I've come up empty Google'ing and figured I'd ask here before coding my own.
Thanks, Mike
----
Mike Hanby
mhanby @ uab.edu
Systems Analyst II - Enterprise
IT Research Computing Services
users supply a
"--time" parameter via a job submit plugin. Would that be a fair statement?
Thank you in advance!
-Mike Schor
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
at the job is
> pending? If you do squeue or scontrol show job it should list the reason
> why its pending. If its Resources, then the scheduler is waiting for
> sufficient resources to free up to scheduler. If its is Priority then the
> job is pending due to other jobs ahead of it.
>
]*'|grep -o
'gres/gpu=[0-9]*([0-9]*)'
Regards,
Mike
From: Alastair Neil via slurm-users
Sent: Tuesday, February 6, 2024 11:30 PM
To: slurm-us...@schedmd.com
Subject: [External] [slurm-users] Is there a way to list allocated/unallocated
resources defined in a QoS?
This email originated o
“From:” address tha
slurm uses. Is there a way to do this and I’m just missing it?
---
Mike VanHorn
Senior Computer Systems Administrator
College of Engineering and Computer Science
Wright State University
265 Russ Engineering Center
937-775-5157
michael.vanh...@wright.edu
--
slurm-users mailing list --
Ah, that looks like what I need, I was just looking in the wrong place.
Thank you!
---
Mike VanHorn
Senior Computer Systems Administrator
College of Engineering and Computer Science
Wright State University
265 Russ Engineering Center
937-775-5157
michael.vanh...@wright.edu
From: Paul Edmon via
s your needs.
Mike Robbert
Cyberinfrastructure Specialist, Cyberinfrastructure and Advanced Research
Computing
Information and Technology Solutions (ITS)
303-273-3786 | mrobb...@mines.edu <mailto:mrobb...@mines.edu>
On 7/8/24, 14:20, "Dan Healy via slurm-users"
wrote:
Dear All,
Does anyone know what is the equivalent field for ResvCPURAW of sacct command
of Slurm in its version 23.11.5.
I get:
sacct: error: Invalid field requested “ResvCPURAW”.
Your help at your earliest is very much appreciated.
Best,
-Mike
Sent from my iPhone
--
slurm-users
79 matches
Mail list logo