t the memory
usage reported by 'seff' is unreliable [2].
Is that indeed the case?
Cheers,
Loris
Footnotes:
[1] https://github.com/PrincetonUniversity/jobstats
[2] https://doc.dhpc.tudelft.nl/delftblue/Slurm-trouble-shooting/
--
Dr. Loris Bennett (Herr/Mr)
FUB-IT, Freie Universit
for
individual jobs, when requested. We also don't pre-empt any jobs.
Apart from that, I imaging implementing your 'soft' limits robustly
might be quite challenging and/or time-consuming, as I am not aware that
Slurm has anything like that built in.
Cheers,
Loris
> On Wed,
n incentive to specify a shorter
wallclock limit, if they can.
'sqos' is just an alias for
sacctmgr show qos
format=name,priority,maxwall,maxjobs,maxsubmitjobs,maxtrespu%20
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
FUB-IT, Freie Universität Berlin
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
tion applies only to Scheduler‐
Type=sched/backfill. Default: 1440 (1 day), Min: 1, Max: 43200 (30
days).
Regards
Loris Bennett
--
Dr. Loris Bennett (Herr/Mr)
FUB-IT, Freie Universität Berlin
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
e other partitions for interactive work.
This is obviously also even more make-shift :-)
Cheers,
Loris
> Thanks a lot,
> Ole
>
> --
> Ole Holm Nielsen
> PhD, Senior HPC Officer
> Department of Physics, Technical University of Denmark
--
Dr. Loris Bennett (Herr/Mr)
FUB-IT, Freie
resumably, 'small' GPU jobs
might potentially have to wait for resources in other partitions, even
if resources are free in 'large-gpu'. Do you other policies which
ameliorate this?
Cheers,
Loris
[snip (135 lines)]
--
Dr. Loris Bennett (Herr/Mr)
FUB-IT, Freie Univer
Hi,
Over a week ago I sent the message below to the address I found for the
list owner, but have not received a response.
Does anyone know how to proceed in this case?
Cheers,
Loris
Start of forwarded message
From: Loris Bennett
To:
Subject: Unable
e data points.
Cheers,
Loris
> -Paul Edmon-
>
> On 9/5/24 10:22 AM, Loris Bennett via slurm-users wrote:
>> Jason Simms via slurm-users writes:
>>
>>> Ours works fine, however, without the InteractiveStepOptions parameter.
>> My assumption is also that default v
5a * D-20146 Hamburg * Germany
>
> Phone: +49 40 460094-221
> Fax:+49 40 460094-270
> Email: be...@dkrz.de
> URL:http://www.dkrz.de
>
> Geschäftsführer: Prof. Dr. Thomas Ludwig
> Sitz der Gesellschaft: Hamburg
> Amtsgericht Hamburg HRB 39784
>
> -
into the compute node:
$ ssh c001
[13:39:36] loris@c001 (1000) ~
Is that the expected behaviour or should salloc return a shell directly
on the compute node (like srun --pty /bin/bash -l used to do)?
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
FUB-IT, Freie Universität Berlin
--
slurm
a
> echo "Running on $(hostname)"
> echo "We are in $(pwd)"
>
>
> # run the program
>
> /home/arkoroy.sps.iitmandi/ferro-detun/input1/a_1.out &
You should not write
&
at the end of the above command. This will ru
5 005, India
> Email: a...@iitmandi.ac.in
> Web: https://faculty.iitmandi.ac.in/~arko/
--
Dr. Loris Bennett (Herr/Mr)
FUB-IT, Freie Universität Berlin
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
ause you are starting 'slurmd' on the node,
which implies you do want to run jobs there. Normally you would run only
'slurmctld' and possibly also 'slurmdbd' on your head node.
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
FUB-IT (ex-ZEDAT), Freie Universität Berlin
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
orge
>
> --
> George Leaver
> Research Infrastructure, IT Services, University of Manchester
> http://ri.itservices.manchester.ac.uk | @UoM_eResearch
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le
k through release notes back to 22.05.10 but can't see anything
> obvious (to me).
>
> Has this behaviour changed? Or, more likely, what have I missed ;-) ?
>
> Many thanks,
> George
>
> --
> George Leaver
> Research Infrastructure, IT Services, University of Man
tSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
> grep 672204 /var/log/slurmctld
> [2024-06-04T15:50:35.627] sched: _slurm_rpc_allocate_resources JobId=672204
> NodeList=(null) usec=852
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe s
cluster is not really the fastest so I am planning on having users use the
> /tmp/ directory
> for speed critical reading and writing, as the OSs have been installed
> on NVME drives.
Depending on the IO patterns created by a piece of software using the
distributed file system might be fine or a
Hi Dietmar,
Dietmar Rieder via slurm-users writes:
> Hi Loris,
>
> On 4/30/24 3:43 PM, Loris Bennett via slurm-users wrote:
>> Hi Dietmar,
>> Dietmar Rieder via slurm-users
>> writes:
>>
>>> Hi Loris,
>>>
>>> On 4/30/24 2
Hi Dietmar,
Dietmar Rieder via slurm-users writes:
> Hi Loris,
>
> On 4/30/24 2:53 PM, Loris Bennett via slurm-users wrote:
>> Hi Dietmar,
>> Dietmar Rieder via slurm-users
>> writes:
>>
>>> Hi,
>>>
>>> is it possible to have slur
'srun ... --pty bash', as far as I
understand, the preferred method is to use 'salloc' as above, and to use
'srun' for starting MPI processes.
Cheers,
Loris
> Thanks so much and sorry for the naive question
>Dietmar
--
Dr. Loris Bennett
rs are important for us because we have a large number of
single core jobs and almost all the users, whether doing MPI or not,
significantly overestimate the memory requirements of their jobs.
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
FUB-IT (ex-ZEDAT), Freie Universität Berlin
--
slurm-
f our cores.
The downside is that very occasionally nodes may idle because a user has
reached his or her cap. However, we have usually have enough uncapped
users submitting jobs, so that in fact this happens only rarely, such as
sometimes at Christmas or New Year.
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
FUB-IT (ex-ZEDAT), Freie Universität Berlin
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
er.uid' has the value
0.0
and is thus not an integer. The only user within the Docker cluster is
'root'.
Has anyone come across this issue? Is it to do with the Docker
environment or the difference in the OS versions (Lua 5.1.4 vs. 5.3.4,
lua-posix 32 vs. 33.3.1)?
Cheers,
Loris
an specify how many jobs should run simultaneously
with the '%' notation:
--array=1-200%2
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
FUB-IT (ex-ZEDAT), Freie Universität Berlin
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
cifying
> LaunchParameters=enable_nss_slurm in the slurm.conf file and put slurm
> keyword in passwd/group
> entry in the /etc/nsswitch.conf file. Did these, but didn't help either.
>
> I am bereft of ideas at present. If anyone has real world experience and can
>
be in a single partition.
Was this indeed the case and is it still the case with version Slurm
23.02.7?
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
FUB-IT (ex-ZEDAT), Freie Universität Berlin
nfigless" Slurm:
https://slurm.schedmd.com/configless_slurm.html
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
FUB-IT (ex-ZEDAT), Freie Universität Berlin
Hi Kamil,
Kamil Wilczek writes:
> W dniu 4.01.2024 o 07:56, Loris Bennett pisze:
>> Hi Kamil,
>> Kamil Wilczek writes:
>>
>>> Dear All,
>>>
>>> I have a question regarding the fair-share factor of the multifactor
>>> priority a
res and thus treated equally by the fair-share
mechanism.
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
FUB-IT (ex-ZEDAT), Freie Universität Berlin
would be much nicer if
multiple GPUs types passed to '--gres' were ORed.
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
e to configure some sort of partition QoS so that the number
>>> of
>>> jobs or cpus is limited for a single user.
>>> So far my testing always depends on creating users within the
>>> accounting database however I'd like to avoid managing each user and
>>> having to create or sync _all_ LDAP users also within Sturm.
>>> Or - are there solutions to sync LDAP or AzureAD users to the Slurm
>>> accounting database?
>>> Thanks for any input.
>>> Best - Eg.
>>>
>>
>
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
ion in this e-mail or
> any attachments. The DRW Companies make no representations that this e-mail
> or any attachments are free of computer viruses or other defects.
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
>
>
>
> FPGA* up infinite 1 idle
>FPGA01
>
> Any pointers will help.
Why do you think that the output above is wrong?
Cheers,
Loris
> Regards,
>
> DJ
>
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
Loris Bennett writes:
> Christopher Samuel writes:
>
>> On 10/13/23 10:10, Angel de Vicente wrote:
>>
>>> But, in any case, I would still be interested in a site factor
>>> plugin example, because I might revisit this in the future.
>>
>> I don
s, I found it.
I'll have a go at creating a memory-wasted factor.
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
Hello Angel,
Angel de Vicente writes:
> Hello Loris,
>
> "Loris Bennett" writes:
>
>> Did you ever find an example or write your own plugin which you could
>> provide as a example?
>
> I'm afraid not (though I didn't persevere, because for the
or plugin to start with.
>
> Do you know of any examples that can set me in the right direction?
Did you ever find an example or write your own plugin which you could
provide as a example?
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
o a range of unpreferred behaviour) and
>> provides a clear motivation to change. Could be done with QOS unless
>> you already use that in a conflicting way.
>> Gareth
>> Get Outlook for Android <https://aka.ms/ghei36>
>>
interested in knowing whether one can take into
account the *requested but unused memory* when calculating usage. Is
this possible?
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
th certain restrictions, such as a
shorter maximum run-time.
What are the pros and cons of the reservation approach compared with the
above partition-based approach?
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
Loris Bennett writes:
> Hi,
>
> Since upgrading to 23.02.5, I am seeing the following error
>
> $ squeue --array-unique
> squeue: unrecognized option '--array-unique'
> Try "squeue --help" for more information
>
> The help for 'squeue
p array-unique
--array-unique display one unique pending job array
Is this a regression or is something else going on?
Regards
Loris Bennett
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
ocal machine and then starts jupyter-lab. The users can then can
point the browsers on their local machines to a local port and be
connected to the session on the compute node.
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
extflow.io/
It is slightly problematic from our point of view, as it does not yet
support job arrays. However, there is development activity going on to
address this:
https://github.com/nextflow-io/nextflow/issues/1477
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
ophic failure in sbatch-file. If they fail, usually it's bad and
>> there is no
>> sense to crunch the remaining thousands of job array jobs.
>>
>> OT: what is the correct terminology for one item in job array... sub-job?
>> job-array-job? :)
>>
>> cheers
>>
>> josef
>--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
any remaining resources on the node are only available via partition A.
A second job can only start on N in partition B if no jobs on N are running
in partition A.
Regards
Loris Bennett
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
se in 30
minutes and you will have to leave. With '--deadline' you can decide
case by case.
Cheers,
Loris
> Sent from my iPhone
>
>> On Jul 5, 2023, at 1:43 AM, Loris Bennett wrote:
>>
>> Mike Mikailov writes:
>>
>>> About the last point.
lt)|minutes|hours|days|weeks]]
[snip (36 lines)]
>
> Queuing system No Yes
>
> I am not sure what you mean with the last point, since 'salloc' is also
> handled by the queueing system. If the resources requested are
> currently not available, 'salloc' will
for counting
> users tasks and run them. However, I have received different
> results in cluster performance for the same task (task execution time is too
> long in case of salloc). So my question is what is the difference
> between these two commands, that can affect on task performance? Thank you
> beforehand.
>
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
tiple jobs
with identical resource requirements :-(
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
Hi Reed,
Reed Dier writes:
> On Jun 27, 2023, at 1:10 AM, Loris Bennett
> wrote:
>
> Hi Reed,
>
> Reed Dier writes:
>
> Is this an issue with the relative FIFO nature of the priority scheduling
> currently with all of the other factors disabled,
> or sin
array task ID and the way the input files are
organised. We are currently not sure about the best way to do this
in a suitably generic way.
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
ering down nodes
which are not required. What is your use-case for wanting to spread the
jobs out?
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
hat partition.
We use QOS to set different priorities, but we don't use preemption.
> Since i have jobs thath must run at specific time and must have priority over
> all others, is this the correct way to do?
For this I would probably use a recurring reservation.
Cheers,
Loris
> Thanks
>
> FR
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
er approach?
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
gt;> happy about these tools. You're talking about 1 of jobs on one
>> hand yet you want fetch the status every 30 seconds? What is the
>> point of that other then overloading the scheduler?
>>
>> We're telling your users not to query the slurm too often and usually
>> give 5 minutes as a good interval. You have to let slurm do it's job.
>> There is no point in querying in a loop every 30 seconds when we're
>> talking about large numbers of jobs.
>>
>>
>> Ward
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
Loris Bennett writes:
> Hi David,
>
> (Thanks for changing the subject to something more appropriate).
>
> David Laehnemann writes:
>
>> Yes, but only to an extent. The linked conversation ends with this:
>>
>>>> Do you have any best practice about
Slurm.
I have every sympathy for people working on Open Source projects and am
very happy to offer assistance and have commented on lack of support for
job arrays in Nextflow here:
https://github.com/nextflow-io/nextflow/issues/1477
This is in fact where I learned about the potential nega
s: Slurm's job limits are configurable, see this Wiki page:
>> https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#maxjobcount-limit
>>
>> /Ole
>>
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
ly can't create job
arrays, and so generates large numbers of jobs with identical resource
requirements, which can prevent backfill from working properly.
Skimming the documentation for Snakemake, I also could not find any
reference to Slurm job arrays, so this could also be an issue.
Jus
y quicker?
>
> 2) Slurm developers, whether `scontrol` is expected to be quicker from
> its implementation and whether using `scontrol` would also be the
> option that puts less strain on the scheduler in general?
>
> Many thanks and best regards,
> David
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
SchedMD
employee writes
I don't think this should ever happen.
Has anyone else seen this issue?
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
ou could also dump your
database, find a (virtual) machine running some appropriate RedHat-like
OS, create the RPMs for the three versions of Slurm you need, install
the first one, import your database and then do the updates. Finally
you can dump the database again and import it on your Debian 11 system.
That would still be a bit of a faff and so still may not be worth it.
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
g to take place, with each
process getting a quarter of a core on average. It is not clear that
you will actually increase throughput this way. I would probably first
turn on hyperthreading to deal with jobs which have intermittent
CPU-usage.
Still, since Slurm offers the possibility of oversubscrip
Ryan Novosielski writes:
>> On Dec 8, 2022, at 21:30, Kilian Cavalotti
>> wrote:
>>
>> Hi Loris,
>>
>> On Thu, Dec 8, 2022 at 12:59 AM Loris Bennett
>> wrote:
>>> However, I do have a chronic problem with users requesting too much
>>&
Ryan Novosielski writes:
> On Dec 8, 2022, at 03:57, Loris Bennett wrote:
>
> Loris Bennett writes:
>
> Moshe Mergy writes:
>
> Hi Sandor
>
> I personnaly block "--mem=0" requests in file job_submit.lua (slurm 20.02):
>
> if (job_desc.min_mem_
jobs, even if --mem=0 is specified (I guess).
Cheers,
Loris
> -------
> From: slurm-users on behalf of Loris
> Bennett
> Se
Loris Bennett writes:
> Moshe Mergy writes:
>
>> Hi Sandor
>>
>> I personnaly block "--mem=0" requests in file job_submit.lua (slurm 20.02):
>>
>> if (job_desc.min_mem_per_node == 0 or job_desc.min_mem_per_cpu == 0) then
>> s
single job. How can I block a --mem=0 request?
>
> We are running:
>
> * OS: RHEL 7
> * cgroups version 1
> * slurm: 19.05
>
> Thank you,
>
> Sandor Felho
>
> Sr Consultant, Data Science & Analytics
>
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
the non-GPUs,
which do wake up properly.
Thanks for confirming that there is no fundamental issue.
Cheers,
Loris
> Best
>
> Ümit
>
>
>
> From: slurm-users on behalf of Loris
> Bennett
> Date: Thursday, 13. October 2022 at 08:14
> To: Slurm Users Mailing List
ent energy
situation, I was wondering whether this a problem others have (had).
So does power-saving work in general for GPU nodes and, if so, are there
any extra steps one needs to take in order to set things up properly?
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
"Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)"
writes:
> On Sep 29, 2022, at 10:34 AM, Steffen Grunewald
> wrote:
>
> Hi Noam,
>
> I'm wondering why one would want to know that - given that there are
> approaches to multi-node operation beyond MPI (Charm++ comes to mind)?
>
> The
Hi Ole,
Ole Holm Nielsen writes:
> Hi Loris,
>
> On 9/29/22 09:26, Loris Bennett wrote:
>> Has anyone already come up with a good way to identify non-MPI jobs which
>> request multiple cores but don't restrict themselves to a single node,
>> leaving cores idle
t does not help you much, but perhaps something to think about
>
> On Thu, Sep 29, 2022 at 1:29 AM Loris Bennett
> wrote:
>>
>> Hi,
>>
>> Has anyone already come up with a good way to identify non-MPI jobs which
>> request multiple cores but don't restri
only one core is actually being used.
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
Hi Ole,
Ole Holm Nielsen writes:
> Hi Loris,
>
> On 9/26/22 12:51, Loris Bennett wrote:
>>> When designing restriction in job_submit.lua, I found there is no member in
>>> job_desc struct can directly be used to determine the node number finally
>>> alloca
writes:
> Hi all:
>
>
>
> When designing restriction in job_submit.lua, I found there is no member in
> job_desc struct can directly be used to determine the node number finally
> allocated to a job. The job_desc.min_nodes seem to
> be a close answer, but it will be 0xFFFE when user not s
ng thousands of jobs. Once we get them to use job array,
such problems generally disappear.
Cheers,
Loris
> Regards,
> Hermann
>
> On 9/16/22 9:09 AM, Loris Bennett wrote:
>> Hi Hermann,
>> Sebastian Potthoff writes:
>>
>>> Hi Hermann,
>>>
>>&g
e normal Epilog since we wanted to
> avoid running slurm as root and I don’t have to worry
> about ownership of the output file.
Yes, good point. We should look into that.
Cheers,
Loris
> Sebastian
>
> Am 16.09.2022 um 09:09 schrieb Loris Bennett :
>
> Hi Hermann,
>
&
Hi Hermann,
Sebastian Potthoff writes:
> Hi Hermann,
>
> I happened to read along this conversation and was just solving this issue
> today. I added this part to the epilog script to make it work:
>
> # Add job report to stdout
> StdOut=$(/usr/bin/scontrol show job=$SLURM_JOB_ID | /usr/bin/grep
be to
aggregate the times over, say, a month and provide a the absolute totals
and maybe a run-to-wait ratio.
Has anyone already done anything like this?
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
nsure that the priorities of this user's jobs are always higher than
everyone else's?
Cheers,
Loris
> On Fri, Jul 29, 2022 at 7:00 AM Loris Bennett
> wrote:
>
> Hi Byron,
>
> byron writes:
>
> > Hi Loris - about a second
>
> What is the use-cas
is causing your slurmdbd to
timeout and that is the error you are seeing.
Regards
Loris
> On Thu, Jul 28, 2022 at 2:47 PM Loris Bennett
> wrote:
>
> Hi Byron,
>
> byron writes:
>
> > Hi
> >
> > We recently upgraded slurm from 19.05.7 to 20.11.
found in the slurmctld log.
>
> Can anyone suggest how to even start troubleshooting this? Without anything
> in the logs I dont know where to start.
>
> Thanks
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
il Wilczek pisze:
>> Yes, it is possible, we have 63 GPUs. But I have a problem with
>> the interpretation of this value. Specifically, I would like
>> to know how it is calculated. I couldn't find it in the docs
>> (or I'm just bad at searching :)).
>>
--
not? How may GPU cards do you have? We have 24 and our top user
for the same time period is 4382(27.41%). This seems reasonable to me.
As there are 513 hours in the period, your user would have had to have
used around 15 cards fairly continuously. Is that not possible?
Cheers,
Loris
>
t; settings are for example:
>>>>
>>>> PriorityType=priority/multifactor
>>>> PriorityWeightJobSize=10
>>>> AccountingStorageTRES=cpu,mem,gres/gpu
>>>> PriorityWeightTRES=cpu=1000,mem=2000,gres/gpu=3000
>>>>
>>>>
in an array and on fairshare
to do the rest.
Cheers,
Loris
> Thanks,
>
> William Dear
>
> ------
> From: slurm-
nient wrapper around a bunch of jobs. Each element of a job
array still has its own job ID, so you can extract job data the same way
you do for a non-array job.
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
lso set
PriorityWeightFairshare=0
to remove even the effect of the CPU-usage over the day.
Cheers,
Loris
>
>
> -Original Message-
> From: slurm-users On Behalf Of Loris
> Bennett
> Sent: Tuesday, April 12, 2022 12:06 PM
> To: Slurm User Community List
> Subject:
u have received this email in error, kindly delete
> it from your computer
> system and notify us at the telephone number or email address appearing
> above. The writer asserts in respect of this message and attachments all
> rights for confidentiality, privilege or privacy to the fulle
it relates to the jobs table. Is there a way to fix the data ?
Run
scontrol show runawayjobs
If any are found you should be offered the option of fixing them.
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
U = thekla
>>
>> However, I cannot set a partition:
>>
>> sacctmgr modify user thekla account=ops set partition=gpu
>> Unknown option: partition=gpu
>> Use keyword 'where' to modify condition
>>
>> This is not possible?
>>
>> The only solution I found to that is to delete the association and create it
>> again with the partition:
>>
>> sacctmgr del user thekla account=ops
>>
>> sacctmgr add user thekla account=ops partition=gpu
>>
>> Thank you,
>>
>> Thekla
>>
>>
>
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
already
> an
> enhancement (Bug 11591) but nothing happened so far...
>
> Regards,
>
> Alexander
>
>
> Am 10.01.2022 um 11:14 schrieb Loris Bennett:
>> Hi,
>>
>> Does setting 'mail_user' in job_submit.lua actually work in Slurm
>> 21.08
_get_job_req_field() contains 'mail_user'.
Cheers,
Loris
Marcus Boden writes:
> Hi Loris,
>
> I can confirm the problem: I am not able to modify the job_desc.mail_user.
> Other
> values can be modified, though.
>
> We are also on 21.08.5
>
> Best,
> Mar
rts of the plugin work, but they only read other elements of
job_desc and do not modify anything.
Am I doing something wrong?
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
Hi Diego,
Diego Zuccato writes:
> Hi Loris.
>
> Il 14/12/2021 14:16, Loris Bennett ha scritto:
>
>> spectrum, today, via our Zabbix monitoring, I spotted some jobs with an
>> unusually high GPU-efficiencies which turned out to be doing
>> cryptomining :-/
> W
ou use some kind of framework to automate the actual sending of
the nastygrams?
2. What metrics do you use for deciding whether a nastygram regarding
resource usage needs to be sent?
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
them very much. At the opposite end of the usage
spectrum, today, via our Zabbix monitoring, I spotted some jobs with an
unusually high GPU-efficiencies which turned out to be doing
cryptomining :-/
Cheers,
Loris
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
Hi Ole,
The new version looks good to me.
Cheers,
Loris
Ole Holm Nielsen writes:
> Hi Loris,
>
> I fixed errors in the hostnamelength calculation and formatting.
> Could you grab the latest pestat and test it?
>
> Thanks,
> Ole
>
> On 12/13/21 13:56, Loris Bennett
1 - 100 of 277 matches
Mail list logo