nstances)
L1i: 768 KiB (16 instances)
L2: 14 MiB (10 instances)
L3: 30 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-23
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma
=PENDING Reason=ReqNodeNotAvail,_UnavailableNodes:gpu03
Dependency=(null)
Paint me surprised...
Diego
Il 07/12/2024 10:03, Diego Zuccato via slurm-users ha scritto:
Ciao Davide.
Il 06/12/2024 16:42, Davide DelVento ha scritto:
I find it extremely hard to understand situations like this. I wish
#x27;long' (10, IIRC).
Diego
On Fri, Dec 6, 2024 at 7:36 AM Diego Zuccato via slurm-users us...@lists.schedmd.com <mailto:slurm-users@lists.schedmd.com>> wrote:
Hello all.
An user reported that a job wasn't starting, so I tried to replicate
the
request and
llocTRES=
CapWatts=n/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
-8<--
So the node is free, the partition does not impose extra limits (used
only for accounting factors) but the job does not start.
Any hints?
Tks
--
Diego Zuccato
DIFA - Dip.
e their jobs
last longer than the wall time limit by suspending and resuming a job?
Best,
*Fritz Ratnasamy*
Data Scientist
Information Technology
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 4
est it).
Diego
Il 20/11/2024 12:37, Ole Holm Nielsen via slurm-users ha scritto:
On 20-11-2024 08:28, shaobo liu wrote:
DSP(Digital Signal Processing) is a type of hardware accelerator.
Which Linux operating system is your DSP running? Is the DSP device
hosted in a normal Linux server?
/
l to slurm-users-le...@lists.schedmd.com
<mailto:slurm-users-le...@lists.schedmd.com>
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
--
slurm-us
264 idle CPUs.
Not sure if its a known bug, or an issue with our config? I have tried
various things, like setting the sockets/boards in slurm.conf.
Thanks
Jack
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Bert
urmctldLogFile: "/var/log/slurm/slurmctld.log"
SlurmdLogFile: "/var/log/slurm/slurmd.log"
SlurmdSpoolDir: "/var/spool/slurm/d"
SlurmUser: "{{ slurm_user.name <http://slurm_user.name> }}"
SrunPortRange: "60000-61000"
StateS
eed to specify the partition at all. Any thoughts?
Dietmar
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
--
slurm-users mailing list -- slurm-users@lists.
Il 06/03/2024 13:49, Gestió Servidors via slurm-users ha scritto:
And how can I reject the job inside the lua script?
Just use
return slurm.FAILURE
and job will be refused.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
end
end
end
return slurm.SUCCESS
end
However, if I submit a job with TimeLimit of 5 hours, lua script doesn’t
modify submit and job remains “pending”…
What am I doing wrong?
Thanks.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
S
I guess NVIDIA
had something in mind when they developed MPS, so I guess our pattern
may not be typical (or at least not universal), and in that case the MPS
plugin may well be what you need.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Univ
SQL replication, then manually switch to slurmdbd to a replication
slave if the master goes down? Do you do something else?
Thanks.
Daniel
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bolo
and sets itself to drained. Another possibility is that
slurmctld detects a mismatch between the node and its config: in this
case you'll find the reason in slurmctld.log .
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le
all_nodes* drained 32 2:8:2 6 0
1 (null) batch job complete f
You have to RESUME the node so it starts accepting jobs.
scontrol update nodename=compute-0 state=resume
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di
hus penalise everyone who requests
large amounts of memory, whether it is needed or not.
Therefore I would be interested in knowing whether one can take into
account the *requested but unused memory* when calculating usage. Is
this possible?
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZED
and a job array range.
I tried to add "-v" to the sbatch to see if that gives more useful info,
but I couldn't get any more insight. Does anyone have any idea why it's
rejecting my job?
thanks,
Noam
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatic
able.id_resv does have 15 different values
(including 0).
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
at the slurm.conf description is misleading.
Noam
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
e to write that snippet in
job_submit.lua ...
Would you expect that to prevent the job from ever running on
any partition?Currently (and, I think, wrongly) that's exactly what happens.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università
y
the correct account.
On Thu, Sep 21, 2023 at 3:11 AM Diego Zuccato <mailto:diego.zucc...@unibo.it>> wrote:
Hello all.
We have one partition (b4) that's reserved for an account while the
others are "free for all".
The problem is that
sbatch --partitio
d having to replicate scheduler logic in
job_submit.lua... :)
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
contain enough
nodes to satisfy the request. That seems to also apply to all_partitions
jobsubmitplugin, making it nearly useless.
We're using Slurm 22.05.6 . On 20.11.4 it worked as expected (excluding
partitions that couldn't satisfy the request).
Any hint?
TIA
--
Diego Zuccat
not be a great
problem if the reservation remained...
A reservation should only get deleted when expired, IMO (but I can
understand that there are cases where the current behaviour is desired).
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università
ght on this topic. Your expertise and
assistance would greatly help me in successfully completing my project.
Thank you in advance for your time and support.
Best regards,
Maysam
Johannes Gutenberg University of Mainz
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Ok, PEBKAC :)
When creating the reservation, I set account=root . Just adding
"account=" to the update fixed both errors.
Sorry for the noise.
Diego
Il 04/05/2023 07:51, Diego Zuccato ha scritto:
Hello all.
I'm trying to define a reservation that only allows users in a
p id
[root@slurmctl ~]# getent group res-TEST
res-TEST:*:1180406822:testuser
The group comes from AD via sssd.
What am I missing?
TIA
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel
e two default partitions? In the best case
in a way that slurm schedules to partition1 on default and only to
partition2 when partition1 can't handle the job right now.
Best regards,
Xaver Stiensmeier
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater
why some of us do so many experimental runs of jobs and
gather timings. We have yet to see a 100% efficient process, but folks
are improving things all the time.
Brian Andrus
On 2/13/2023 9:56 PM, Diego Zuccato wrote:
I think that's incorrect:
> The concept of hyper-threading is not d
,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126)
||Has anyone faced this or a similar issue and can give me some
directions?
Best wishes
Sebastian
||
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
o...@gmail.com>
Webpage: http://www.ph.utexas.edu/~daneel/
<http://www.ph.utexas.edu/~daneel/>
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
That's probably not optimal, but could work. I'd go with brutal
preemption: swapping 90+G can be quite time-consuming.
Diego
Il 07/02/2023 14:18, Analabha Roy ha scritto:
On Tue, 7 Feb 2023, 18:12 Diego Zuccato, <mailto:diego.zucc...@unibo.it>> wrote:
RAM used by
ics
<http://www.buruniv.ac.in/academics/department/physics>
The University of Burdwan <http://www.buruniv.ac.in/>
Golapbag Campus, Barddhaman 713104
West Bengal, India
Emails: dan...@utexas.edu <mailto:dan...@utexas.edu>,
a...@phys.buruniv.ac.in <mailto:a...@phys.buruniv.ac.
nd reference to the "default partition" in `JobSubmitPlugins`
and this might be the solution. However, I think this is something so
basic that it probably shouldn't need a plugin so I am unsure.
Can anyone point me towards how setting the default partition is done?
Best regards,
Il 21/10/2022 19:14, Rohith Mohan ha scritto:
IIUC this could be the source of your problem:
SelectTypeParameters=CR_CPU_Memory
Maybe try CR_Core_memory . CR_CPU* does not have notion of
sockets/cores/threads.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma
d between controllers, right?
Possibly use NVME-backed (or even better NVDIMM-backed) NFS share. Or
replica-3 Gluster volume with NVDIMMs for the bricks, for the paranoid :)
Diego
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Ber
too.
Regards,
--
Willy Markuske
HPC Systems Engineer
Research Data Services
P: (619) 519-4435
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
Il 26/05/2022 11:48, Diego Zuccato ha scritto:
Still can't
export TMPDIR=...
from TaskProlog script. Surely missing something important. Maybe
TaskProlog is called as a subshell? In that case it can't alter caller's
env... But IIUC someone made it work, and that confuses
ace used by the job).Still can't
export TMPDIR=...
from TaskProlog script. Surely missing something important. Maybe
TaskProlog is called as a subshell? In that case it can't alter caller's
env... But IIUC someone made it work, and that confuses me...
--
Diego Zuccato
DIFA - Dip.
ry when the
calculation is done, but I'm sure there must be a better way to do this.
Thanks in advance for the help.
best regards,
Alain
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
}/usr/mpich-4.0.2
gives an executable that only uses 1 CPU even if sbatch requested 52. :(
Any hint appreciated.
Tks.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20
roduced (on newer versions)?
Can this somehow be avoided by setting a default number of tasks or some
other (partition) parameter? Sorry for asking but I couldn't find
anything in the documentation.
Let me know if you need more information.
Best Regards, Benjamin
--
Diego Zuccato
eed to see the users'
home dirs and/or job script dirs.
==
Paul Brunk, system administrator
Georgia Advanced Resource Computing Center
Enterprise IT Svcs, the University of Georgia
On 2/10/22, 6:26 AM, "slurm-users"
wrote:
[EXTERNAL SENDER - PROCEED CAUTIOUSLY]
On Thu, 2022-02-
slurmctld need read access to /home/userA/myjob.sh or does it
receive the job script as a "blob" or as a path? Does it even need to
know userA's GID or will it simply use 'userA' to lookup associations in
dbd?
Tks.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servi
MBs less than older
one (been there, done that... :( ).
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
Tks.
Will be useful soon :)
Are there other monitoring plugin you'd suggest?
Il 17/12/2021 11:15, Loris Bennett ha scritto:
Hi Diego,
Diego Zuccato writes:
Hi Loris.
Il 14/12/2021 14:16, Loris Bennett ha scritto:
spectrum, today, via our Zabbix monitoring, I spotted some jobs wi
Hi Loris.
Il 14/12/2021 14:16, Loris Bennett ha scritto:
spectrum, today, via our Zabbix monitoring, I spotted some jobs with an
unusually high GPU-efficiencies which turned out to be doing
cryptomining :-/
What are you using to collect data for Zabbix?
--
Diego Zuccato
DIFA - Dip. di Fisica
imulator ~]$ sacct -j 791 -o "jobid,nodelist,user"
JobID NodeList User
--- -
791 smp-1 user01
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
knowingly, not by accident), I'm afraid...
Best,
Steffen
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
only impact autodetection (so it "just" requires manual
config) or GPU jobs won't be able to start at all?
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
l get the error
you saw. Restarting slurmd on the submit node fixes it. This is the
documented behavior (adding nodes needs slurmd restarted everywhere). Could
this be what you're seeing (as opposed to /etc/hosts vs DNS)?
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi In
ith a shorter wallclock
time could be backfilled till the reservation/maintenance starts. You
can put the reservation anytime in the system but at least or before
" minus ", e.g.
scontrol create reservation= starttime=
duration= user=root flags=maint nodes=ALL
Hope, that helps a
ages are listed here:
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#install-prerequisites
/Ole
On 05-11-2021 15:38, Diego Zuccato wrote:
They aren't using modules so it must be something system-wide :(
But not all jobs are impacted. And it seems it's a bit random (doesn't
They aren't using modules so it must be something system-wide :(
But not all jobs are impacted. And it seems it's a bit random (doesn't
happen always).
I'm out of ideas, currently :(
Il 05/11/2021 13:10, Ole Holm Nielsen ha scritto:
On 11/5/21 12:47, Diego Zuccato wr
askAffinity=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
MemorySwappiness=0
MaxSwapPercent=0
AllowedSwapSpace=0
Any ideas?
Tks.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
t
tax errors and the most common errors is already
a big help, expecially for noobs :)
[OK]: All nodeweights are correct.
What do you mean with this? How can weights be "incorrect"?
If someone is interested ...Surely I am :)
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Serv
grading-slurm
Yup. That's why I upgraded the whole cluster at once.
Tks for the help.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
d node state specified").
SLURM 20.11.4.
Tks.
Diego
Il 01/10/2021 21:32, Paul Brunk ha scritto:
Hi:
If you mean "why are the nodes still Drained, now that I fixed the
slurm.conf and restarted (never mind whether the RealMem parameter is
correct)?", try 'scontrol update nodena
lt;--
I also tried lowering RealMemory setting to 6, in case MemSpecLimit
interfered, but the result remains the same.
Any ideas?
TIA!
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bolo
Il 20/09/2021 13:49, Diego Zuccato ha scritto:
Tks. Checked it: it's on the home filesystem, NFS-shared between the
nodes. Well, actually a bit more involved than that: JobCompLoc points
to /var/spool/jobscompleted.txt but /var/spool/slurm is actually a
symlink to /home/conf/slurm_spoo
y.
The explanation at below is taken from slurm web site:
"The backup controller recovers state information from the
StateSaveLocation directory, which must be readable and writable from
both the primary and backup controllers."
Regards;
Ahmet M.
20.09.2021 12:08 tarihinde Diego Zuccato
ue.
I'm currently in the process of adding some nodes, but I already did it
other times w/ no issues (actually the second slurmctld node have been
installed to catch the race of a job terminating while the main
slurmctld was shut down).
Anything I should double-check?
Tks.
--
Diego Zucca
right now):
RealMemory=257433 AllocMem=0 FreeMem=159610
That's probably due to buffer/caches remaining allocated between jobs.
They're handled by the SO and should be automatically freed when a
program needs mem.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Inform
IIRC we increased SlurmdTimeout to 7200 .
Il 06/08/2021 13:33, Adrian Sevcenco ha scritto:
On 8/6/21 1:56 PM, Diego Zuccato wrote:
We had a similar problem some time ago (slow creation of big core
files) and solved it by increasing the Slurm timeouts
oh, i see.. well, in principle i should not
21 12:46, Adrian Sevcenco ha scritto:
On 8/6/21 1:27 PM, Diego Zuccato wrote:
Hi.
Hi!
Might it be due to a timeout (maybe the killed job is creating a core
file, or caused heavy swap usage)?
i will have to search for culprit ..
the problem is why would the node be put in drain for the reas
hen when I submit one job with 8 gpus, it will
pending because of gpu fragments: nodes A has 2 idle gpus, node b 6 idle
gpus
Thanks in advance!
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bolog
task fails and how
can i disable it? (i use cgroups)
moreover, how can the killing of task fails? (this is on slurm 19.05)
Thank you!
Adrian
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna
and bash?
GNU Awk 4.2.1, GNU bash, versione 5.0.3(1)-release .
The job timings are printed by pestat if you use the -S, -E and -T
options. See help info by "pestat -h".
I'll have another look on monday for further testing (I started quite
early this morning :) ).
Tks a lot for n
ot;when will my job start". But pestat and slurmtop are
different tools for different uses, no need to duplicate all functionality.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
g the
slurmdbd/mariadb support is all right with no problems, but slurmctld
still does not start on boot.
Also in the log reported blade01 is the hostname of one of the nodes.
You should probably fix /usr/lib/systemd/system/slurmdbd.service as well.
/Ole
--
Diego Zuccato
DIFA - Dip. di Fisica e
he sender and delete it permanently from your
computer system.
Fai crescere i nostri giovani ricercatori
dona il 5 per mille alla Sapienza
*codice fiscale 80209930587*
--
Diego Zuccato
DIFA - Dip. di Fisica e Astron
e. I don't
quite see how one could integrate pestat itself directly into Zabbix, as
it is more geared to producing a report, but maybe Ole has ideas :-)
How to use the collected data is one of the big open problems in IT :)
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Inform
m not really OK with it yet (for example I still can't understand
how I can exclude some metrics from a host that got 'em added by a
template... When I'll have enough time I'll find a way :) ). Maybe
pestat can be added to the Zabbix metrics...
--
Diego Zuccato
DIFA - Dip. di
=192, restarted slurmctld and it keeps seeing all CPUs...
What should I think?
But another problem surfaces: slurmtop seems not to handle so many CPUs
gracefully and throws a lot of errors, but that should be something
manageable...
Tks for the help.
BYtE,
Diego
Il 21/07/2021 11:01, Diego Zuc
Uff... A bit mangled... Correcting and resending.
Il 21/07/2021 08:18, Diego Zuccato ha scritto:
Il 20/07/2021 18:02, mercan ha scritto:
Hi Ahmet.
Did you check slurmctld log for a complain about the host line. if the
slumctld can not recognize a parameter, may be it give up processing
whole
] _build_node_list: No nodes satisfy JobId=33808
requirements in partition b4
(str957 is the second frontend/login node that I've had to take offline
for an unrelated problem).
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
ems related to a regression in later versions...
Maybe delete Boards=1 SocketsPerBoard=4 and try Sockets=4 in stead?
Already tried. Actually, it's been the first try.
The pam_slurm_adopt is very useful :-)
IIUC only if you allow users to connect to the worker nodes. I don't. :)
Wiki notes could be helpful?
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#compute-node-configuration
Tks. Interesting, but I don't se pam_slurm_adopt. Other than that, it
seems very much like what I'm doing.
BYtE,
Diego
On 7/20/21 12:49 PM, Diego Zuccato wrote:
Hello all.
It'
.
I restarted slurmctld after every change in slurm.conf just to be sure.
Any idea?
Tks.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
sibly impacting other
users. Even if you just make users "pay" for the resources used by
applying fairshare, the temptation to game the system could be too big.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pic
. Maybe someone more
experienced can refine it.
No... it doesn't work...
-Mensaje original-
De: Diego Zuccato
Enviado el: jueves, 10 de junio de 2021 10:37
Para: Slurm User Community List ; Gestió
Servidors
Asunto: Re: [slurm-users] Job requesting two different GPUs on two differen
,
--gres=gpu:GeForceRTX2070:1” because line “#SBATCH --gres=” is for each
node and, then, a line containing two “gres” would request a node with 2
different GPUs. So… is it possible to request 2 different GPUs in 2
different nodes?
Thanks.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
submitting your job.
Brian Andrus
On 6/1/2021 4:15 AM, Diego Zuccato wrote:
Hello all.
I just found that if an user tries to specify a nodelist (say
including 2 nodes) and --nodes=1, the job gets rejected with
sbatch: error: invalid number of nodes (-N 2-1)
The expected behaviour is that slurm
found conflicting info about the issue. Is it version-dependant?
If so, we're currently using 18.08.5-2 (from Debian stable). Should we
expect changes when Debian will ship a newer version? Is it possible to
have the expected behaviour?
Tks.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronom
nked to
in this page.
Tks.
I upgrade Slurm frequently and have no problems doing so. We're at
20.11.7 now. You should avoid 20.11.{0-2} due to a bug in MPI.
That's a really useful info.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - U
dle manually-compiled packages).
As Ole said, it's an old version. I'd love to be able to keep up with
the newest releases, but ... :(
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
ed Reported
- -- ---
ophcpu 81.93%0.00%0.00% 15.85%
2.22% 100.00%
ophmem 80.60%0.00%0.00% 19.40%
0.00% 100.00%
BYtE,
Diego
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informa
Il 14/05/2021 08:19, Christopher Samuel ha scritto:
sreport -t percent -T ALL cluster utilization
"sreport: fatal: No valid TRES given" :(
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 401
t's
out of my depth, but there's a very low-volume mailing list at
ccr-xdmod-l...@listserv.buffalo.edu
<mailto:ccr-xdmod-l...@listserv.buffalo.edu> you could inquire at.
[1] https://github.com/ubccr/xdmod/releases/tag/v9.5.0-rc.4
<https://github.com/ubccr/xdmod/releases/tag
Il 12/05/21 13:30, Diego Zuccato ha scritto:
Anyway, at a first glance, it uses a bit too many technologies for my
taste (php, java, js...) and could be a problem integrating it in a
vhost managed by one of our ISPConfig instances. But I'll try it.
Somehow I'll make it work :)
The m
ig instances. But I'll try it.
Somehow I'll make it work :)
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
ess to the bare numbers is definitely a no-no :)
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
have to do some changes (re field
witdh: our usernames are quite long, being from AD), but first I have to
check if it extracts the info our users want to see :)
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2
or at least the data to put in a spreadsheet for
further processing)?
Tks.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
d got propagated (as implied by
PropagateResourceLimits default value of ALL).
And I can confirm that setting it to NONE seems to have solved the
issue: users on the frontend get limited resources, and jobs on the
nodes get the resources they asked.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astro
so when I tried to limit to
1GB soft / 4GB hard the memory users can use on the frontend, the jobs
began to fail at startup even if they requested 200G (that are available
on the worker nodes but not on the frontend)...
Tks.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Inform
Il 29/03/21 09:35, taleinterve...@sjtu.edu.cn ha scritto:
> Why the loop code cannot get the content in job_desc? And what is the
> correct way to print all its content without manually specify each key?
I already reported it quite some time ago. Seems pairs() is not working.
--
Diego Z
n the partition.
So the definition will have to be reversed: set the partition limit to
the max allowed (1h) and limit all users except one in the assoc.
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40
Il 29/01/21 08:47, Diego Zuccato ha scritto:
>> Jobs submitted with sbatch cannot run on multiple partitions. The job
>> will be submitted to the partition where it can start first. (from
>> sbatch reference)
> Did I misunderstand or heterogeneous jobs can workaround this
1 - 100 of 160 matches
Mail list logo